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ABSTRACT 


An Interactive scene Interpretation system (ISIS) Is beinrr 
developed by Stanford Research Institute's Artificial Tntellloence 
Center as a tool for constructinc and experimentina with man-machi ne 
and automatic scene analysis methods tailored for particular imaoe 
domains. This report describes a recently developed reoion analysis 
subsystem based on the Paradlqm of Brice and Rennema, Usino this 
subsystem a series of exoerlments was conducted to determine aood 
criteria for Initially partitionino a scene into atomic regions and 
for mergino these renlons into a final partition of the scene alone 
obiect boundaries. Semantic (problem-dePendent) icnowiedge la 
essential for complete* correct partitions of complex real-world 
scenes. An Interactive approach to semantic scene seementatlon was 
developed and demonstrated on both landscape and indoor scenes. This 
aonroach provides a reasonable methodoloay for segmentino scenes that 
cannot be processed completely automatically at ‘present* and is a 
promislno basis for a future automatic system. The report also 
describes a program that can automatically Generate strateoies for 
finding specific objects in a scene based on manually designated 
Pictorial examples. 
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INTRODUCTION 


This report (Jescrlbes recent research in Interactive . scene- 
analysts usino • the Interactive Scene interpretation System tTSTSi 
developed at SRT, our overall oblectlves are to! Cl) devise 
interactive- methoHol oql es tor rapidlv nroqramiring computers to 
recoenlze oblects in particular real»world pictorial domains^ and- C2) 
develop techniques -for cooperative scene anlysls whereby humans can 
.provide the computer with outdance when completely automated 
orocesslnq is infeasible. 

Scene analysis is still more of an art than a science. The 
field is characterized bv a few general principles and a growing 
collection of ad hoc techniques perfected, laroely by trial and 
error, for particular domains. Even experienced researchers have 
difficulty predicting which, if any, of these techniques apply to a 
new domain, ISTS was oriolnally developed to helo researchers refine, 
their intuitions. Data can be observed on a qray scale or color 
display as it is perceived by the computer, Prlirltive operators can 
be applied to selected areas of the scene to determine directiv fin 
numerical terms) which features provide the best discrimination aneno 
particular oMects, Oblect finding strategies can be formulated in 
terms of these features and tested on-line, by requesting the system 
to illuminate all Instances observed in a displayed scene, it was 
possible to construct strategies for findlno each of the princlp'al 
surfaces in a slpole roo.m scene (e,a.,, floor, tabletop, chairseat, 
and wall) in a matter of hours [IJ, These strateqies were based 
on primitive operators for extracting hue, saturation, height, and 
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surface orientation from correlated arrays of color and simulated 
ranee data-. Attribute descriptions fe,a.» the color buff and the 
orientation horizontal) were. developed in a similar fashion and used 
to express additional oblect finding strategies in symbolic terms, 

ISIS Is a step toward more natural pictorial communication 
with machines. ideally, one would like to prooram a computer to 
find unfamiliar objects as one would instruct a person by pointing at 
examples, or by providing crude verbal descrlntions A‘ tree, for' 
example, might be described as a "oreen. leafy reoton above a tall, 
brown, vertical, barV-textured region," Unfamiliar concepts such as 
"green" dr "leafy" could In turn be defined bv pointing at examples. 
The computer mlaht demonstrate comprehension by outlining instances 
of the described object in a displayed imaoe. The pronrammer could 
■then refine the description empirically to correct errors in the 
computer‘'s interpretation. While we are still far from achieving 
this ideal, some significant progress has been made, 

1,1 Summary of Pesearch Progress 

This project has addressed -several major limitations 

of the initial version ot ISIS described in Reference 1. First, the 

preliminary system lacked dlsolay capabilities, primitives, and scene 

« 

segmentation technloues needed to function effectively in natural 
(l,e’,‘, outdoor) scenes. Second, while providing helpful tools. It 
left the major programming burden- with the user. Third, the system 
provided no capabilities for high-level graphical communication such 
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as "pointing," 


Our ' ability ' to observe data as they appear to the 
computer was significantly' enhanced by the acouisiti-on of a' color 
image display, The display was particularly helpful in investioatino 
color and' texture discrimination in landscape scenes, nnlivre 
Painted ob'lects found in -room scenes-, most naturally occurring 
oblects cannot be distinguished solely on the oasis of locally 
Sampled attributes such as hue and saturation. Therefore# outdoor 
scenes must be partitioned Into coherent regions so that global 
attributes (e,g,# texture and shape) and soatlal relations with other 
regions fe,g,# adlacenev# vertical oosl-tion# and the like') can he 
used. 

Various syntactic’ and semantic techolgues for' scene 
parti-tioning were experimentally evaluated using ISIS, A -new 
Interactive- approach to semantic scene segmentatlor wgs demonstrated 
on both landscape and indoor scenes. This approach is of ' interest 
both as a reasonable way to segment scenes that at present cannot be 
processed completely automatically and as a promising base for a 
future automatic system. 

Progress was made in auto"»atlno ' the ' interactive 
generation of -strategies for finding ob'lects 'in room scenes-,' 
Spepifleally# a ' program was developed which accepts pictorial' 
examples, of obiecf surfaces (e,'g,, floor# wall, tabletop# and ether 
oblects) outllned'on a -dlspjay screen ny 'a human tral'ner, and 
attempts to design a good strategy for selecting samples of those 
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surfacps In future Ineoes, The nroora'*' emin^tes t>e empirical 
approach of a hunan trainer? the exampi.e is first rharacterl teH in 
ter’ns of feature extraction operators, fc set of features is selected 
that dlstlnaulshes the exaipple fron examples-of previously leareed 
concepts. Finally# a corresponding predicate is • oenerated ana 
applied to random Inane sample points. The selected feature set Is 
emnlrlcallv refined, if necessary, to exclude sample points not 
belondina to the exa^ole# or to include omitted samples of the 
example. The human trainer can nuide the machine's 
experimentation by describing the new oniect in terms of attributes 
of nreviousiy characterized opdects or py dlrectiv suooestlnq which 
operators to try, 

Desiopatin*? examples bv pointlno with a cursor is 
usually mere natural than drawing a detailed outline, eSPeclallv for 
spatially amorphpus obiects such as trees. However, oolnrlno is 
intrinsically ambloupus* the trainer could be deslonatlno anythino 
from the Particular picture element to the entire Picture,' 'rpe 
machine must first ouess the example that It Is expected to describe,- 
Two pointing inference mecha^'lsms were devised, one based on region 
growing for use in outdoor scenes and the other on automatic strategy 
generation for use in indoor scenes, in the first instance, the 
region grower was modified to grow outward from designated starting 
Points .inside and nutslde the oolect of interest. To the second 
case, a strategy was generated specifically for distinguishing 
between samples designated to be on an ohiect of interest and those 
designated to be on adiacert obieets. In both cases, after the, 
computer's guess is displayed, the trainer can elaborate on .his 
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intent, by tieslanating additional sanries on and off the obiect. 
The ouesslnq process then iterates until trainer and computer agree. 

The interactive techniques described in this report 
are not yet integrated into a single system. Moreover, many of the 
scene analysis results have yet to be rigorously tested in a ■ large 
number of scenes. The prelect has recently reached a very fertile 
experimental stage, as indicated. by the fact that new ideas for scene 
analysis are occurring much more frequently than ideas for improving 
the system. 


1,2 Development of OLI.SP Languaoe 

ft portion of this proiect has supported development 
of a high"level programming language called OLISP, The OLISP 
language provides most of the features of Qa 4 embedded within the 
I'JTERLISP svstem. Thus, QLISP provides a rich varietv of data 

types, a data base for content-directed retrieval of expressions, a 

Dowerful pattern matching capability, oattern-direeted function 

\ 

invocation, and a mechanism for manipulating data contexts. These are 
all available In a programming environment that provides a versatl'ie. 
Lisp-oriented editor! an easv-to-use file pae>cage for malntalnino 
symbolic files! a "programmer's assistant" which allows commands to 
be Undone or aiteredj and an error correction system which can. fix 
many simple user errors autom.atlcallv. 

Although QI.ISP has not been used as the orogrammino 
language for the werx described elsewhere In this reeort, the 
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requirements of that worlc have provided direction to the effort to 
streamline a sophisticated but costly lanquaae Into an efficienf tool 
for problem solvino# 

Durino the period covered by this report* the basic 
features of the lanquaqe were implemented and incorporated into a 
rather reliable and easy-to-use packaoe, A compiler has been 
Implemented* and other modifications have been made to improve the 
lanquage's. efflcieney* Complied programs written In QLISP run about 
30 to 50 times faster than the corresponding QA4 proorams. 

The most serious drawback to QLISP is that it tends 
to force the programmer who uses it into an unnecessarily restricted 
framework,. Thus* for example* QLISP's data storage and retrieval 
statements are Implemented In such a way as to prohibit the use of 
the deta base as a network of items that could be easily traversed. 
Furthermore* the "grain” of the available statements Is sometimes too 
coarsej i,e,* it is not always possible to specify precisely hew a 
retrieval operation should be carried out. These are the primary 
reasons why it was decided net to implement ISIS in QLISP, 

However* QLlsP has been shown to be a useful tool for 
problem solving* simulation* and automatic programming. 
Therefore* work- will continue* under other supoort* to further 
Improve and streamline this language. During the coming year* we 
Plan to Install a new pattern matcher wMch can oerform unification 
(matching two patterns* both of which contain variablesj as well as 
do simple -matches more quickly, we will install a new data- base 
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mechaplsr tnat will allciv "'ore efficient access of related structures 
in the associative store, win Investjoate the extension of the 
associative store to secondary storaae. Finally, when the 
Bobrow-weahreit control stacic formalism Is incoroorated ‘ into 
IMTERMSP, w’e Will add facilities f or- nseudo-oata 1 le 1 nrocessinn to 
OTjTSP, 


OIiIPP is available over toe SRpfl net, and has already 
been used on an exoerimehtal basis at several sites atound th» 
network • 
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2 


SYSTFM DEVEIOPMENT 


A substantial amount of effort durlno the oast year was 
devoted to systeirt development, A simple dloltlztnq device for 

Dhotoaraphlc transparencies was constructed, A color display was 
acouired and interfaced* necessltatlnq the development of b.oth low- 
and hlah«level color display software, > The 'or lo Inal ISIS system 

described In Reference 1 was extensively overhauled to Improve both 
efficiency and memorv utilization, 

2,1 Hardware 

2,1.1 .Slide Oioltlzer 

A direct view slide dlqitizer was constructed 

r 

uslna a sheet -of translucent oola’<ote as the optical Interface. 
Imaaes viewed throuah a orourd olass interface had undesirable arain, 
and Imapes viewed on a conventional oroiectlon screen had dull* 
unsaturated color, A Ijl field lens was also tried as a direct view 
interface. It provided brilliant colors hut the imaae was too small 

N _ 

for the zoom optics built into our camera. 

Pictures were .diaitlzed at a sanDlina 
resolution of 120 X 120 elements, 5 bits/color. The camera aaln and 
iris were peaked for each color filter, so that the hrlohtest white 
in the scene (usually- a cloud) was barely saturated. This strateov 
optimized the contrast and siqnal to noise for each filter over most 
Of- the briohtness ranoe and also comoensated for varvir.o filter 
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densities^ sn that white oroduceri a unitor"'- resoonse 


There are well-known problems In iislno 
Vidlcon television cameras for photometric measurements t21. The 
dynamic hriahtness ranoe of tvplcal outdoor scenes# whether vieven 
llve or on s3-ldes, nenerally exceeds the available iOOsl dvnamic 
ranae of most television cameras. Quantitative comparisons of 
brlohtness observed throuqh multlole color filters are frustrated bv 
spectral variations in taroet sensitlvitv and by automatic oaln 
circuitry built into the camera electronics. One way to avoid these 
oroblems Is to dlaitize slides with a mechanical scanner uslho a 
ohotomuitipiier , Time can then be dlrectiv traded for any desired 
combination of sensitivity# intensity resolution# sinral/noise# 
dynamic ranoe, and so forth, Lacklna such hardware ourselves, we 
Were fortunate to acouire over the APPA net a small library of 
accurately dloitlzed • scenes from researchers at farneaie ‘-’elion. 
University, These imaqes were orloinally dioiti^ed on a Muir-head 
drum at the University of Southern California Imaqe Processlno- 
Laboratory, The orlqinal data contained P bits/color at a soatiai 
resolution of 600 X e20. These data wore subsequently resampied' 
down to 240 X 240 resolution. Some of these i-maoes were used in 
reolon orowipa experiments reported later, 

2,1,2 Color Display 

A RA^'TSK* GXlOO color Video dismay was 
acquired and interfaced, rePlacina an Adane vector dl splav used ir 
earlier ISIS versions ,' The Ramtek system conslslts of an 18 inch color 
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TV monitor refreshed throoqh D/A converters by a series shift 

registers, Fach Individually addressable shift realster for mpriorv 
Diane) stores J hit of brightness- information for each discrete 
raster eleinent. The svsten ourchased hv SPI contains a total of 14 

memory planes which can be configured to broduee two basic 

capabilities 5 

tt) Display a 25fi X 256 element color Image* each 
element dlsolayed uslno 4 bits of intensity 
information each for red# green* and blue. 

Two overlays* bright red and briobt green* are 
available for superimposing vector araohics, 
cursors* and so on without disturbing the 
underlying image, 

(2) Display a 256 X 256 element grey scale image* 

each element displayed with 8 bits of brightness 
on a blacic and white monitor. 

These capabilities represent an economic balance between the soat.ial 
and brightness resolution available from our sensors and the cost of 
additional memory, Solid state memorv w'as chosen sa as to avoid 
difficulties in synchronitina multiple color planes because of skew 
and stability errors inherent In disk refreshed systems. 

All of our woric to date has been performed 
With the color monitor, although we anticloate several future 
applications that will benefit from the high resolution grey scale 
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*#The cost of each memory Plane for 2*16 X ?56 spatial resolution '•■as 
approximately $1 »Opo. Me-nory cost increases linearly with additional 
brldhtness resolution and as the souare of soatial resolution. Thus, 
an additional bit of color resolution for each orimary color adds 
S3 , 000 to the total cost, while doublino spatial resolution to *5i? X 
512 at 4 bit/conr Quadruples mei-orv cost to s4fl,noo. 


•Ramtetf Corporation, Sunnyvale, California 940q6. 
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(c) 


<d) 


TA-8721-6 


FIGURE 1 (a) SYNTHETIC COLOR TRIANGLE; (b) SRI OFFICE SCENE; (c) LANDSCAPE 

SCENE (MONTEREY, CALIFORNIA); Id) CARNEGIE-MELLON OFFICE SCENE 


13 







2.2 


Software 


Th* core of ISIS consists of the following basic 
software modulest 


« Graphics supoert 

* Primitive feature extraction operators 

* Iconic and symbolic data structures 

* General application programs (filtering, 
scanning, region growing, region classifying, 
statistics gathering, and so forth). 

These components are being continually modified in an upward, 
compatible manner to improve efficiency and to Increase the system's 
generality. Two Improvements in the Past year were particularly 

noticeable. First, the FORTRAN data structure used for storing 
region and sample descriptions has been replaced by a more compact 
and efficient structure Implemented using special features of 
INTERLISP, Second, filtering and similar computatlonallv expensive 
functions are now executed efficiently In FORTRAN, This has been 
made possible by software that allows arrays of Image samples as well 
as conjunctive filter predicates to be communicated between LiSP and 
FORTRAN, 


A device-independent graphics pacitage was Implemented 
which allows displays with alphanumerlcs , points, vectors, grey 
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5cal^» and f'ljl c«lrr data to be devaion^d in LISF, FPRTPAN, or SMf. 
and displayed at run tl!i»e on either t^e Pantetc, Adaie, or 
Hewiett-Pactfard displays suniect to hardware caoabi I It les , np to lOn 
dlSDlaV lists can he defined# which are manipulated Independent i v. 

Specialized support routines tor the Pamteif allow 
arbitrary sized color Inatje arrays to he snown at a specified scale 
and location on the display screen. Msers can thus create rontaoes 
of Imaaes at various staoes of nrocesslno. Alphanumeric and vector 
overlays are automatically scaled to correspond with underlylno imaoe 
displays, Graphics can he outout on any nemorv plane# althouch 
usually the red overlay niane is used, informative color oraohies 
such as harorarhs# hlstoorams# false color Imane renditions, and so 
forth# can he synthesized conveniently with reolnn eolorina routines. 
Fxampies lllustratlno the use ot these caoanlllties aonear throuohout 
this report, hetailed uranhles software documentation Is available 
from the authors. 
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PPJMTTIVF OFFHATPRS 


Scene aralypls requires nrlwltlve nrerators tbat ran 


dlst Inaii 1 5h arom objects lo a domain of Interest# A combination ot 
local attributes such as bue, helqht, and orientation provides 
adehuate discrimination to simple room scenes# mir -,rit In landscapes. 
Natural colors tend to be weatflv saturated with broad spectral peaks 
(See Flqures 2a#b), dost of the poundafv Irformatlon usually 
associated with color Is actually caused pv hriqhtness alone (Flqure 
20, Heloht and orientation are more characteristic of plane surfaced 
room furniture than of Irreoularly contoured surfaces In natural 
scenes, (Tt Is also m^cn harder outdoors to oofain the ranne data 
needed for measurlno these features.) Analyses of natural soeoes 
conseouently must use additional, dlonal features derived Primarily 
from visual data. The remainder of this section describes 
exoerlmantal primitives tor three olooal fnatures: texture, share, 
and spatla] context. 


nonhcmooenenus reoions a coherent aooearanco. Texture arises from 
reoular (possibly statistical) variations in local attributes (sucb 
as brlobtness, hue, and saturation) and exists at all levels of 
description. Thus# a hillside of trees, a treetoo, and the surface 
of a slnnle leaf each have cnaracterlstic textures, nne Is usuaiiv 


3,1 


Texture 


1,1,1 Intf o'^uct 1 on 


Texture Is an ill-defined pronerty that olves 
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FIGURE 2 





(e) 


(fl 


TA-8721-30 



(a,b) COLOR CODED SCATTER PLOT OF HUE VERSUS SATURATION FOR 
REGIONS DESIGNATED IN FIGURE 2b; (c) LANDSCAPE SCENE FROM FIGURE 
1c WITH BRIGHTNESS INFORMATION REMOVED (BRIGHTNESS 
NORMALIZATION ACHIEVED BY DIVIDING THE RED, GREEN, AND BLUE 
COLOR COMPONENTS OF EACH PICTURE ELEMENT BY THE SUM OF THOSE 
COMPONENTS); (d) LANDSCAPE SCENE SAMPLED TO 40 x 40 RESOLUTION; 
(e) 20 DEGREE QUANTIZATION INTERVALS DISPLAYED ON COLOR 
TRIANGLE; (f) LANDSCAPE SCENE WITH EACH SAMPLE DISPLAYED IN THE 
COLOR OF THE 20 DEGREE INTERVAL TO WHICH IT IS CLASSIFIED 
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most aware of the aross texture of the smallest scene elements 
Involved In an articulated description. 

The texture detail available In a dlaitlzed 
Image depends on viewing scale (i.e.. magnification) and spatial 
sampling density. The level of texture needed to distinguish, say* a 
region of grass from a region of water has net been clearly 
determined. It would, of course, be preferable to rely on macro 
texture where grass might be characterized as a region containing 
green, brown, and yellow patches. One could then sample very coarsely 
(e.g,. 40 X 40 or less) and avoid the mass of data needed to 
characterize the Individual blades, we thus concur with Hanson and 
Rlseman 13) who questioned Bajcsy's arguments for using a 
quantization grid several times finer than the finest texture element 
[4], Figure 2d shows the same scene as Figure tc (120 X 120 
resolution), sampled at 40 X 40 resolution. Viewed at a distance of. 
say. 15 feet, this figure Is quite recognizable. Therefore, 
qualitatively It Probably contains sufficient texture data to 
characterize Its component objects. 

Texture. unll>ce brightness and hue. Is not a 
monolithic attribute and cannot generally be expressed by any single 
functional representation. Investigators have used a wide variety of 
features to classify or distinguish particular textures, but have 
been unable to formalize how features should be selected for a 
particular class of scenes, ISIS Is a useful tool for determining 
aoproorlate texture features exoerlmentallyi operators can be applied 
to selected regions of a scene Individually and in weighted feature 
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vectors to test fUscriminatlont 

3,1.2 Textore Ooerators 

Ne are accuTulati*»cr a llbrarv of contnonlv 
useH texture operators Droce<?ures (Ta^iio 3,15, These ooeretors 

are croiioed into two classes* fnirro»textures and nacro^textures . 
Micro-textures are characterized by statistical distributions of 
briohtness, hue, and ' saturation at the blcturo element “level, 
Macro»*textures are composed of ^rouoinos of elementarv regions (i,e,» 
regions of homogeneous color and brightness) and are characterized bv 
distributions - of s.haoe, density, sbatlal arrangement, and 
micro-texture of those regions. 
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Table 3,1 


A BASIC t.IBpARY OF TFXTUPE OPEPATHPS 
Micro Textures 


1, Distribution -statistics for brightness, hue, and saturation 
(tavcen directionallv or nondlre.ct lonallvl j 

a, ' Wean 

b, Mode 

c, -Standard deviation 

d, SVrev 

e, , Kurtosis 

2, Features derived from directional • soatiaJ dePendencv 
arrays [51 t 

a, Fnerov 

b, .Entropy 

*3, Edoe density and directional edoe- density fvertlcal edoe 
<3en5ity/hoPiTontai edge density! Cf-1 : 

*4, Statistics derived from Fourier oower and phase spectrum f4]! 

a. Peav rower, blmode, and so on, 

b, , Mncjulstic features? nonodi rectlonal , linear, 

bidirectional, blobs, and so on. 


♦Macro Textures 


Statistics derived from distributions of elementary r»oion nrooerties 
C7] t 

a. snare (based on moments! 

b. Size (perimeter, area) 

c. Density (average number of regtons/unlt area)' 


♦These operators nave not vet been imoiementea on our system. 
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3-,l,3 Texture Displays 


A nun'ber* of soecial displays have been 
formulated to help visualize the effectiveness of various texture 
oDerators, An experimenter can circle two reqions of the scene and 
obtain superimposed histograms of hue» saturatiop* and brightness 
(see Figure 3), He can also obtain superimposed two-dimensional 
Scatter plots of hUe versus saturation for multiple regions, (The 
yellow and red scatter points in Figure 2b correspond respectively to 
the yellow and red regions designated In Figure 2a.) He can dlspiav 
the original Image sampled to an arbitrary resolution (Figure 2d), 
Furthermore, each sample can be shown with the average or modal 
brightness or hue of the surrounding area (Figures i, 5). Spatial 
dependency arrays (giving the probability of encountering, a 
transition between two specified brightness levels when scanning the 
image horizontally, vertically, or diagonally) can be displayed for 
selected regions as two-dimensional brightness arravs rather than as 
large arrays of meaningless numbers (Figure 6K 

3,1,4 Discussion 

A systematic evaluation of te«ture operators 
is not one of our oblectlves, although a facility such as ISIS would 
be helpful In such an undertaXino, From a cursory examination. It 
appears that statistics for spatial texture implemented so far do not 
discriminate substantially better than a comparison of brightness, 
hue. or saturation distributions using standard tests of’ significance 
such as chl-sguare or Kolmogorov-sml rnov (KS), Kncouraging 
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(a) 



(b) 

TA-8721-4 

FIGURE 3 (a,b) OVERLAPPING HISTOGRAMS OF BRIGHTNESS, HUE, AND SATURATION 

FOR REGIONS OUTLINED IN FIGURE 3B 
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TA-8721-8 


FIGURE 4 SAMPLED LANDSCAPE SCENE WITH EACH SAMPLE DISPLAYED AT THE 
AVERAGE BRIGHTNESS OVER A 3 x 3 NEIGHBORHOOD 
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FIGURE 5 SAMPLED LANDSCAPE SCENE WITH EACH SAMPLE DISPLAYED AT THE 
MODAL BRIGHTNESS OVER A 3 x 3 NEIGHBORHOOD 
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(c) REGION 2 (d) REGION 3 

TA-a?21-10 


FIGURE 6 SPATIAL DEPENDENCY ARRAYS FOR REGIONS DESIGNATED IN FIGURE 6a 
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HUE, SAT, OR GRAY LEVEL 

TA-8721-5 


FIGURE 7 MODIFIED KOLMOGOROV — SMIRNOV SIMILARITY CRITERIA: 
TWO CUMULATIVE DISTRIBUTIONS ARE COMPARED ON THE 
BASIS OF AVERAGE MAGNITUDE DIFFERENCE, SHOWN SHADED 

(After Muerle and Allen [8] ). 
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exper imenta J. results ^leve been obtaired uslm a rrodi f teat Ion of the 
KS criteria first crooosed by ^■‘uerie and aiipn fp] and illustrated In 
Flqure 7, 

A proPilslnd future direction is the use of 
ISIS to deduce ad hoc characteristics tnat dlstinouish particular 
textures in a limited scene domain. For examnle* a reaion mluht he 
adeqviately char acter 1 zed, at the micro-texture level, slmoly as the 
set of .samples with a Prescribed Droximity to samples havino a 
dlstlnouished hue (the detailed hue distribution of the selected 
DOints ■ beino uni moortant 1 , In one particular scene, regions of sky 
and lake both contained manv samp-ies with virtually Identical biiie 
hues. However, in the lake, the blue samples were liberal Iv 
interspersed with distinctive oreen samoles. At the macro-texture 
level, a .reciion could be described In terms of distlnguisblno 
attributes of comoon'ent reolons, as whan descriotno arass as a reoion 
containing oreen, yellow, and brown blobs. A particularly simple 
macro-teXture descriptor is the numner or density of smaller reojons 
contained in a standardized window, in Firure for instance, the 
river and bushes are partitioned into many small reolons, while 
grass, skv, and trees are represented by a tew laroe reolons. 

Although texture should somedav significantly 
enhance color discrimination, it win not alwavs allow unloue 
interpretation. Reolons of ground, dark tree bark, and treetop in 
Floure Ic are virtually JndistinauJshahie to the human eve when 
viewed throuoh small slits in a mask. The same confusion aonUes 
to reolons of mountain and sea and of rock and lioht tree hark. 
Clearly, shaoe an^ spatial context are also necessary. 





TA-8721-n 


FIGURE 8 RIVER SCENE SEGMENTED INTO 350 RELATIVELY HOMOGENEOUS REGIONS, 
ILLUSTRATING THE USE OF REGION DENSITY AS A TEXTURE FEATURE 
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3.2 


Shape 


Shape# llKe texture, is ni^t a monniithic attribute 
and therefore is not amenable to universal representation. Table 
3.2 lists some two dlirensional shape primitives. So far» onlv the 
first two prlfnitives have been imolenented. The ratio of (v.ertlral 
perimeter /horizontal perimeter) was tested and proved effective In 
dlstinouishlno thin, vertically elonnated renlons of "treebark" from 
non-elonoated reoJons of "oround" and "treetop" in partitioned 
landscape scenes, (Sections of iaooed houndarv with horizontal or 
vertical extent less than two were lane.red In enmoutlna this ratio.) 
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Table 3.2 

SHAPE PRIMITIVES FOR REGTDNS 
Global Shabe 

1. Compactness t fperlmeter squareH/area) 

2, Orientatloni (Vertical Derimeter/horlzontal oerlmater) 

3, Aspect Ratioj ( 1 enqth/wiatb of tightest boun’dlnc 
rectanq-le) 

4. • Moments 

5>, 'Sveleton characteristics (e.o.i hranchino structure! 

C9l 

Pounrfary Descriptors (location and alobal) 

a. Curvature Points ClOl 

b, Fourier expansion of boundarv curve ni» 123 

c. Chain code features tl3] 

d, Llnauistic descriptions (14, la] 
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3,3 


Spatial Relations 


Soatlal context is an Important factor In resolvlro 

' V 

Interpretation ambiouitles. Procedural representations have been 
implemented for some common three dimensional soatlal relationships 
between two reoions, based on the relative world coordinates of 
vertices in their polyonnal boundaries. These representations are 
described in Table 3,3a and demonstrated in Table 3,3h usino the test 
rectens In Ficure 9, 


The above representations were orlolnallv developed for room 
scenes, assumino availability of ranae data. The accuracy of our 
developmental ranaefinder has, thus far, been dtSapP-ointino: 6 
inches/lh feet, How'evsr, ranoe accuracy is much less critical In 
determlnlno olohal relations than In determinlna local attributes, 
such as surface orientation. Moreover, all of the relations excpot 
for Planarity can he reformulated in terms of two-dimensional imaue 
coordinates for standard eye level views. 


mm An page is 

m POOR fiUALTTXi 
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Table 3. 3 A 


REPRESENTATIONS FOR SPATIAL RELATIONS BETWEEN TWO PLANAR SURFACES 
GIVEN 2 REGIONS — A (c.g.j a KorltonCal Chair Seat) and B (c.g,, a Vertical Chair Bach) 

/ 


y 

/ 

/ 

/ 

/ 

/ 

U X 

WORLD COORDINATES 


REClON B 
(VERTICAU 





TA 


1 Left of/Rtgbt of 

Let Axaln, Axzaax ■ slnlsiun and paxioua X iaage coordinates of boundary points of Region A 
Bxnln, B'caax - mlnlmm and 08X1000 X inage coordinates of boundary points of Region B 

then 

A Region A is Left of Region 6 

^lf£ Bxnln + Bxnax > Axnln Axoax 
and Bxoln > Axaax or 

Max (Axoln, Bxnln) - Axnax - 1 ^ 

Min (Axsiax - Axoln, Bxoax -« Bxinln) +1 

(The last condition provides a reasonable Interpretation of the concept “Left" in cases vdiere Regions A and B par- 
tially overlap ) 

B. Region A Is right of Region 6 

iff Bxnln Bxoax u Axzain + Axetax 
and Axoln > Bxoax or 

Min (Axmax. Bxoax) - Axain 4 - 1 
Min (Axnax • Axnln, Bxnax - Bxoln) '>*1 


2 Be low/ Above 

Let Atoln % Azoax ■ height at naxlouni and mlnlouin Y image coordinates of boundary points of Region A (horizontal surface) 
Bzoln, Bznax - height at oaxiouo and nlnlnuo Y image coordinates of boundary points of Reslon B 

then 

A. Region A Is below Region B 

iff Bzoin + Bzoax ' Azotn + Azmax 
and Bzmn > Aznax or 

Max (Azmtn. Bzoin) - Aznax - 1 
Min (Azoax - Azotn, Bzoax - Bzoin) + 1 ^ * 

B. Region A is abo\o Region 6 

iff Bzoin + Bznax ^ Aznin + Aztaax 
s and Azoln > Szi^ax or 

Sin CAznax. Bznax) - Azraln + I ^ 

Sin (Azmax - Azraln, Bznax - Bzmln) + I ** * 

(An additional relttion, diroctl> abovo/dlroctly below nay be defined by requiring that the regions Involved not be 
CO Che right, left. In front, or tn bick of each other.) 


3, Front/Baeh 

Let Amin. Amax > raaxlraura and nlnlraura range of boundary polnts'of Region A 

Broln, Btmax ■ ooximura and olnicajo range of boundary points of Region B (vertical surface) 

then 

A. Region A Is In front of Region B 

iff Broln + Bmax > Analn + Armax 
and Brraln > Amax or 


Max (Acoln. Brain) - Arraax - I ^ ^ 

Min (Acoax - Armin, Bmox - Bmln) + I 


3 Front /Back (Concluded) 

B. Region A Is In back of Region B 

iff Brraln Brraax £ Amin + Arraax 
Tnd Analn > Bmax or 

Min (Araax. Braax) - Analn + 1 ^ ^ 

Sin (Armx - Amin, Bmax - Bmln) + 1 

4, Coplan a r 

Let PLA, ■ least square planar surface flc^co boundary points of Region A 
PLB, " lease square planar surface fit to boundary points of Region B 

then 

Region A and Region B are coplanar Iff the following criteria hold: 

(1) The surface rromals of PLA and PLB must be parallel to within 10 degrees, 

(2) Each local plane cust intercept the saae coordinate axis (X, ¥, or Z) closest to the origin. 
P) These (taost reliable) intercepts must agree within 10% 

« (These above criteria compensate hcurlstlcally for uncertainties in range data.) 
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PELATIOMS OF St'HFAfKS Tw FIGtIRF <) li.'Sl’Jr. PFPRFSF'JTATIO J.S TAHlF 
(Table lists relations o* obieet 1 to nblect 71 

where A 3 above F s 1*1 front L s left 

PL 3 nelow PK s lo baci< P = rlont 
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FIGURE 9 REGIONS USED TO TEST SPATIAL RELATIONS (TABLE 3-b) 
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4, A PEGTOM AKAtYSlS SUBSYSTEM FOR ISIS 
4,1 Introduction 

Most analyses of natural scenes first attenot to 
oartitlon the inaqe Into co’^erent reolons corresoendlnq to known 
ob1ectst4, 16, 17, }fl], Reqlons provide a convenient basis for 
semantic analysis by. reducing both the amount of detail and the 
ambtouitles of interpretation found at the Picture element level. 

Several models of reaion analysis have appeared in 
the literature, Rottom-un .methods heotn ny exhausti velv partitlonino 
a scene into elementary atonic reqlons and proceed by meroino 
tooeth’er adjacent reaions -with weak common boundaries 117, 1^1. 
Top-down methods beain with a slnale reoion eneoiroassinq the. entire 
imaqe and successively seqment It into smaller,, homogeneous 
subregions .t20j , . These methods can, he comnioed with increased 
efficiency by alloving both merqina and segmentation to proceed from 
an initial Intermediate level partition obtained bv imposing an- 
arbitrary orid t?1, 221, Regions can also be orown indepen-dentlv 
from starting kernels selected with semantic Information [23], 

This section - presents the region analysis 
capahl 1 It'ies being deveiopd tor ISI.S, '■Je first evaluate some 
nonsemantic region analysts techniques .we and other researchers have 
developed and then discuss further improvements based on seman.tics 
Coroblem-dependent knowl edoel • and user, interaction.. 


35 



4.2 


Basic Model 


Rpalon analysis procedures ware developed for ISIS 
uslna a bottom-iip naradlom evolved fron* Brice/Fennema, There are two 
principal staaes of processing: first partition and region growing. 
The Durposp of first partition is. to obtain an Initial segmentation 
of the image which Is conservative in the sense that each reoion 
consists of ‘picture elements oelonglno to only one oMect, In the 
region growing stane* adlaeent regions with ‘similar characteristics 
are merged into a single region to further simplify the organization. 
The desired result is a set of regions corresponding to distinct 
obiects in the scene. 

The most conservative first partition is obtained bv 
making every Picture element a separate reoion. However, region 
analysis- is computationally expensive, making it*-- infeasible to. 
process this amount of detail in a reasonable amounr of time. The 
common practice is to first samole the Picture to reduce resolution 
and then immediately merge adlaeent samples with identical 
characteristics, Brlce/Fenema sampled" a '120 X 12 p image with'ifi 
levels of brightness down to 60 X 60 resolution, and then combined 
adlaeent el,ements with eoual brightness, yielding about 1 ,.ooo 
elementary regions in a blocks ^orld scene, ’'iher data are finelv 
auantized or multidimensional Cl.e,, color and range data are 
available), demanding strict eguality of all attributes can lead to 
an unnecessarily large number of. regions, many caused by guantization 
noise. In such cases it may help- to first classify each sampia into 
a small number of categories and then treat picture elements assigned 
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to tbe sa>ne cateaory as lfient5cal, Yaici’''ovsVfV* for exampT#r sorterf 
each Image sample into one of 25 color categories ‘ covering 
distinctive combinations of hue and Saturation in his outdoor' scenes 
C17], Lleberman obtained an adeauate partition of two simple 
landscapes uslno only seven categories of hue [4i , Experiments Have 
been conducted with several sampling and quantization schemes for 
obtaining first partitions of landscaoe and office scenes* and a few 
methods have' been proved empirically to be adeauate, nur findinos 
are reported in Section 4,4, 

Pegibn arowlnq proceeds ny serially selecting the 
pair of adjacent regions In' the current' organization which are 
globally "most alike*" and meraina these into a sinaje region, Tbe 
order in which reolons are merged Is determined hv a function that 
compares the similarity of a given oair of adjacent regions with the 
similarities of other pairs of regions that remain as candidates -to 
be merged, A variety of criteria for region simiiaritv have been 
used, including average brightness contrast til, Ihl and average 
color contrast ri7], Brlce/F ennema als'o used a-welobtlno factor that 
Weakened the effective contrast of regions that meet along meandetino 
boundaries* a common characteristic of guan.Hratlon contours ClPl.'Tn 
our system the function that determines simiiaritv (and hence merge 
priority) of adjacent regions I's a separate program "odule allowing 
experimentation with a variety of criteria that mav Include problem 
dependent considerations, Experimental results ’with various 
similarity functions are oresented in Section 4,5, 

Various algorithms have been used to control the 
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order and extent of "’erqlnci, Brlce/Fenne.ma neroed boundaries In 
arbitrary order so lono as the welohted strenoth was below an 
exper Iwental ly deter'^tnod absolute threshold. The resultino 
oartition was denendent or the order of meroino. This nromoted 
Yaklrovsky to meroe boundaries on a weakest-first basis# terminatino 
when either an absolute difference threshold or a minlmup' nuifber of 
reolons Was exceeded. Another method# prooosed ny Preuder rzAl , was 
to iperoe .two reoions# without regard for absolute houn'^ary strenoth, 
whenever both were more similar to each other than to any other 
neiohbors. The reoien orowino procedures developed for ISIS follow 
YakibovsVy's .meroe strateoy bv keeo.lm nairs of adlacent reolons on a 
oriority queue# and always first meroino across the oiobaliy weakest 
boundary. 


The basic reolon analysis process is illustrated by 
an example depicted in Floures ,10 through 13,. The first partition 
staoe is based on the briohtness Values associated with a 40 X .40 
sampllno of the image, Reolon pairs are added to the orlorttv 
oueue# usino a simliarltv measure based on averaoe absolute 
briahtness and color difference across their common boundaries, 
Reolon pairs are then removed from the queue one at a time# in order 
of similarity, and a meroe performed in the reolon data structure. 
The- first Partition (Floure lO) yielded 006 initial reolons (half as 
many .reolons as Individual picture elements), -The results of 
subseouent neroino are illustrated with 600, 450, and 250 reoions 
remainino. 


All reolon-based techniques, that effectively analyze 
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TA-8721-13 


FIGURE 11 PARTITIONED LANDSCAPE SCENE AFTER 206 MERGES (600 REGIONS 
REMAINING) 
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FIGURE 12 PARTITIONED LANDSCAPE SCENE AFTER 150 ADDITIONAL MERGES 
(450 REGIONS REMAINING) 
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TA-8721-15 


FIGURE 13 PARTITIONED LANDSCAPE SCENE AFTER 200 ADDITIONAL MERGES 
(250 REGIONS REMAINING) 
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nontrivial scenes mavre use of context -sansiti.ve r.»les base.d on 
problen’-relat-ed vnovledcje #. Me will refer to such r*’les under the 
general headi-no of semantics. 

Semantics have been introduced into meroina decisions 
in a variety of ways. The Br Ice/Fennema weaknes’s heuristic' contains 
an Irmilclt semantic statements edges of oblects are straight in the 
blocXs . world, ftfter an Initial ohase of merotno based on 

nonsemantlc similarity criteria, Yatclmnvsifv's prooram continues 
merqino wl.th a semantic . measure of> boundary strenoth used to set 
merge priorities. This measure used a Bayesian model to compute t^e 
lixellhood that two adlacent regions had the same interpretations,, 
based on both their individual attributes- Ce.o., color, size, «nd 
texture) as well as on attributes of their common boundary (e,o,, 
length, shape, orientation), 

Lieberman's program, aHer grouping samol-es Into 
reol.ens of homogeneous, hue. Identified the largest regions and then 
invoiced .semantic (oMect dependent) procedures to extend and refine 
boundaries. For exa.mple, regions thought to he sicv were extended 

to adjoining regions having a homogeneous texture, while raoiens 
thought to be ground were extended to adiaeent regions having a 
vertically receding texture gradient,, • T^e Harlow and Eisenbeis 
program used semantic knowledge (primarily expected brightness and 
image location- of each region) to select starting locations for 
regions corresponding to the. principal anatomic features in a cnest 
x-rav C23] , Initial reoions were formed nv Indenendentlv ■ morning 
starting kernels with adiaeent samples of similar brightness. The 
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similarity thresholrf differe<? for each feature deoerujlna on exDecred 
brightness variations. This stage terininated when the initial 
regions had reached a slonlflcant size» but before they orew 


together, suhseguent merging 

was 

constrained 

by a 

set of 

semant Ic 

"Structure rules" that tried 

to 

ensure that 

the 

final 

regions 


compiled with known spatial relatlonshlos amono the features,' 

Semantic^ considerations are' hv definition related to' 
soecific problems or Picture domains, and hence thev are Inherentlv 
ad hoc. For this' reason, man-machlne Interaction Is the appropriate’ 
mechanism for Interlectlno' semantic Information that guides the 
region growino process and for experimenting with different semantic 
rules • and determinino empirically their' usefulness In a. speclMc 
problem domain, 'with an interactive facility, the researcher . can 
propose a set of semantic rules and instruct the system to ‘continue 
processing until a specified state of tne world or error condition is 
reached. At this point, he can Investigate the properties of r<*alnns 
and their relationships and modilfy the knowledge base guiding the 
analysis. It is hare that we feel the region aralvsls techniayes 
developed for TSiS constitute a significant original contribution to 
the field. 


4,3 Interactive Features 

Some interactive control features are provided to aid 
In .the formulation of effective region growing strategies, and in the 
selection of appropriate stopping criterl.a. Merges can be performed 
in "step" -mode, one merge at a time, allowing the dynamic 
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characteristics of a prJoritv function to be observed. Pronosed 
merges are indicated on the dlsolav prior to execution. The user mav 
then simulate, alternative merqino criteria bV modJfvino the oueue 
manually.. For example* realons can be selectively deleted from the 
queue, thus becomina unmeroeable, The user can study the dynamics 
of a Particular erroneous merqe hV reouestlnn tne system to enter 
step mode whenever a merae is orooosed within a deslunated 
rectanauiar window. 

The ability to build and modify the nriorltv queue- 
interactively is also useful for exoeriments In cooperative' 
Cman»machlne 1 reolon analysis. For instance, the queue nav oe 
originally built with a few manually selected realons and their 
"neiohbors" Cadlacent realons), Those reaiens then serve as 
"Vterrels" from which all subsequent new realons oro-<. 
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4.4 


First Partition Experiments 


Tne ebleetlve of first oartitlon Is to oroup together 
adlacent picture samples with similar attributes, so as to obtain the 
fewest initial regions without risking a false merge. tn the 
introductory examole, the olcture was first sampled to 40 X 40 
resolut-lonj then adlacent samples with identical brightness were 
combined to form homogeneous atomic regions fFigure 10), This 
section reports on attempts to obtain Improved first partitions using 
different sampling functions and different criteria for judging the 
similarity of adjacent samples. 

4,4,1 Experiments on Sampling 

Processing time considerations dictate that 
region analysis be performed at the lowest resolution that preserves 
important details in the imaoe. The experiments reported in this 
section compare first partitions obtained with straight sampling, 
modal sampling, and mean sampling. All experiments were performed- on 
gray scale images using a 40 X 40 rectangular sampling grid, (In 
scenes with periodic texture^ a random samplino strategy is 
recommended to avoid aliasing effects,) In modal sampling (Figure 
14)» the gray level of each grid Point is taken to be the most 
frequently occurring value of gray level for nearby points. The 
number of Initial regions obtained in this manner is significantly 
reduced (by about one^third), because a lot of small "noise" points 
In the treetop and ground disappear. However* the fine detail In the 
small central branch of Che<^tree Is lost, and the separation between 
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FIGURE 14 FIRST PARTITION OF LANDSCAPE SCENE PRODUCED FROM MODAL 
SAMPLES OF FIGURE 5 {536 REGIONS} 
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thp oround and the treetrunic are lost bv the modal smoothl'no, 
samolina the mean oray-level in a small neighborhood around each grid 
Doint is a poor technique because of its extreme tendency to smooth 
discontinuities as seen in Figure 4, 

We concluded that first oartltions based on a 
Simple Sampling of the gray scale image taken through a neutral' 
density filter contain all of the Informati'on needed to 
conservatively Partition most real Indoor and landscape scenes. This 
conclusion is supported by the absence of most boundaries in images 
Which have had brightness information removed bv normalization 
(Figure 2c), 


4,4,2 Experiments on Classification 

The experiments presented above relied solely 
on brightness Information for subdividing the imaoe. The resulting 
Darti’tions seem overeonservative# with many oblects unnecessarily 
fragmented. Since the region growing stage is ouite time consuming/ 
we were motivated to see whether additional sensorv modalities (i,e,, 
color and ranoe data) could be used in conlunction with brightness to 
reduce the number of initial regions, 

4,4,2a Nonsemantlc Classification 

In the~followlng experiments/ the hue 
of each picture sample was quantized into 20 degree intervals as 
Shown in Figure 2e. (The- hue of each samnie is expressed as a 
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spectral ancle from o to 360 degrees based on a transf ormatlon of the 
raw color data given In ftobendlx 8 of Reference 1*^ Regions wpre then 
formed by grounlno adlacent samoles with hues falling In the same 
Intervali Figure 2t is a false color presentation .of Figure 1c. 
with each samo-ls displayed in the hue of the 20 decree interval to 
Which it is classified. Flcure 15 sno«s the Partition resuitinc from 
this classification. 

Twenty decree hue intervals were 
thought to be conservative compared with the far cruder Quantizations 
employed by both vaiclmovsky and Lleberman, The- resuitinc partitions, 
however, were consistently worse than brightness partitions, Mador 
lea>fs occurred between semantically dlstinrt regions In both indoor 
and outdoor scenes. Moreover, in outdoor scenes* because of 
textural irreoularltles , the total number of regions was actually 
greater than ■ the number produced by a brightness partition. 
Therefore, we concluded that in the absence of semantic information, 
a partition based on brightness was superior to one based so.ieiv on 
hue . 

4.4,2b Semantic Classification 

One might expect to improve color 
Partitions in a eiven domain by selecting guantizatlon intervals 
correspondino to characteristic colors of the prominent oblects. 
Recall that Lieberman had obtained reasonable partitions using just 
seven carefully selected classes of hue. These results couid not 
be replicated in our landscape scenes becaus** of the overlapping hue 
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FIGURE 15 ' FIRST PARTITION OF LANDSCAPE PRODUCED FROM SAMPLES QUANTIZED 
INTO 20 DEGREE HUE INTERVALS (918 REGIONS) 
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dlstr ibutiorts of the princioal oblects, Results in rooir scenes were 
also unsatisfactory/ thouah for different reasons. While son>e 
interior surfaces have broadly distrlhuted hues fe.o,, Patterned 
ruos/ pictures/ shiny tabletoos/ etc.l/ the snalority have snarolv 
defined hues, Tn color coordinated rooms/ these hues tend to cluster 
in a very narrow ranae, deereasinn. the reliability of classification 
and increasinp the attendant risk of an erroneous merge. 

Although the overall partitions 
generated with semantic color classifications were unacceptable. 
Particular regions with distinctive hues could be reliably extracted. 
For example, the oranoe hue of the chair annearino In room scenes 
suoo] led by Reddy and nhlander Is unlaue in those scenes fsee Figure 
Id), Conseouentiv, image samples from the seat and back of that 
chair could he grouped Into regions without risk of false meroes* The 
number of regions that can be extracted in this way Increases 
significantly wpen classification is based on muitioie attributes 
rather than lust color. Sky, tor example* is the only region 
unsaturated in our landscape scenes brighter than 3d (on a scale of 
31). Tabletop is the only horizontal surface higher than two feet 
found In SRI office scenes. These observations prompted a change 
In emphasis 1 rather than attempt to use ranae and color data directly 
in the part 1 1 ionl no process, these attributes would instead be used 
in a preliminary nhase to extract regions composed of samples with 
clear semantic interpretations. Remaining areas of the scene 
(e.g,/ those containing semantically ambiguous samples) could then be 
partitioned conservatively based on brightness, 

A number of, experiments were 


PIIGINAU PAGE IS 
OP POOR QUALITY 
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oeriormed vhereln sa’"ples of di stlnaulaoina oblects vpre identifiofS 
and qroupfd into reolons nrior to oeneral oart J tiorlnn, A set of 
predicates were etpolrlcally developed for selectlno sa’^oles of eae?- 
dlstinoulshed oblert. These predicates are sjT'llar to oredieates 
used for fllterina in object acquisition (see Sectio»' 5i. Predicates 
were applied seauentially tn all unclassified In'aqe samples, Tnese 
Samples passlno a predicate were assjoned a corresnondino setanric 
label and removed frO'" further consideration, Samoles fallino aH 
predicates received a nonseoantlc label based on brightness. rh* 
Scene was then Partitioned in the conventional wav by combining into 
reqlons adjeinlno se'«'nles »;lth the sane labels. 

The two n^st successful results are 

* 

documented in Fioures 16 and 17, These results should be co-np^^red 
with the correspondlno hrlohtness based partitions shown in Mctures 
18 and' I9t Tables and 4,2 document the selective classification 

criteria used. The followino points are oertlnentJ 

(1) To sequential classification, later predicates may be 
able to take advantage of context reductions achieved 

by earlier Predicates, For eXanoie, the Picture on 

' ✓ 

the wall in Floure i -d Is a comolex pattern. However, 

It was removed by a simple heiaht criterion, once the 
Wall Samples had been accounted tnr. 
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FIGURE 16 SEMANTIC FIRST PARTITION OF SRI OFFICE SCENE (235 REGIONS) 
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FIGURE 17 SEMANTIC FIRST PARTITION OF CARNEGIE-MELLON OFFICE SCENE 
(151 REGIONS) 
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FIGURE 18 NONSEMANTIC BRIGHTNESS PARTITION OF SRI OFFICE SCENE 
(583 REGIONS) 
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FIGURE 19 NONSEM ANTIC BRIGHTNESS PARTITION OF CARNEGIE-MELLON OFFICE 
SCENE (441 REGIONS) 
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(?) Absolute classification is not essential i sfifpples from 
rtlfferert objects can satisfy the same classification 
rule without consenuenee » provided that those objects 
are not pictorlallv adjacent. Thus, no harm arises from 
the fact that reolons of both door and picture in 
Figure 16 satisfy the fourth rule given in Table '4«1 , 
since pictures In this domain are constrained to hano 
only on walls, 

(3) E'.ver> when objects cannot be completely discriminated from 
their nelahborSf It- may still be oosslble to null out 
substantial portions intact. The taoietop in Figure 

was partitioned semantically into two characteristic darX 
regions while an ambiguous glossy area in the center was 
partitioned nensemanticallv on the basis of brightness, 

(4) Selective classification schemes are essentially limited 
to Scenes where objects tend to be bomooeneous and 
distinguishable by local attributes. Thus, while 
selective classification has given us good results for 
indoor scenes. It has proved to have little utility in 
outdoor scenes. 


Given a suitable domain, it should be 
Dos’sible to use the acquisition Planner described In Section 5 to 
derive the classification predicates automatically f rom .pletor 1 al 
examples. The classification task should, in fact, be simpler than 
global aeguisition, since the predicate need discriminate only among 
ptctorialiy adjacent surfaces, 
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Table 4,1 


SELECTIVE classification CRITEPTA FOR .FIGURE 16 

1, - Extract floor samples by heloht f<0,i feet), 

2, Extract chalrseat samples by characteristic 
hetcbt and horizontal orientation, 

3, Extract tabletoD samples by characteristic 
heloht and horizontal orientation, 

4, Extract oleture samples Ih tuo Passes, 

a. By characteristic heloht and saturation 
oreater than maxlmijm saturation for wall, 

b. By characteristic height and hue outside the 
hue ranoe of wail, 

5, Extract chairbaeV samples by characteristic heloht, 
vertical orientation, and saturation. 

Partition remainlno samoles bv brightness. 
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Table 4,2 


SELFCTTVF CLASSIFICATION . CRTTEPI A FOp FTCUPF 17 

1, Extract wal.l samples bv their disttnauished 
combination of hue# sa'tMration» and brightness-, 

2, Extract olctu-re sarnies bv selecting, all 
remainlno .samples- In uooer third of Image, 

3, Extract chair samples bv their distinguished 
hue and saturation, 

4, Extract couch samoles bv their distinguished 
hue and saturation, 

6, Extract floor samples hv their distinguished 
hue and saturation, 

6, Extract tableto.n samoies by their dl st,lnguished 
hup» saturation^ and brightness,. 

7, - partition remaining samples bv brightness. 
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4,5 M«r«7e Priority Experinents 

The purpose of these exoerlTrents ras to deterwine 
some faltiy conservative nonsemantic measures o£ reoton similarity to 
use In definlna the meroe priority durina realon arowino. It Is 
Important to note that at some points all nonsemantlc measures will 
commit errors# since they have ’no hloher-ievel Information about the 
scene domain. An example that Illustrates 'this fact was sugaested bv 
Ya><lmovslcy/ who pointed out that in one part of a scene# it mav be 
appropriate to meroe small oreen and yellow renlons eomorisino grass, 
while In another part of the scene, where yellow reoions are part of 
a car, t^ls merging is in error. 

The nonsemantic merge orloritv criterion is intended 
to be used In conlunctlon with functions that apply semantic context 
to each' proposed merge, The priority function should establish a 

reasonable merge order based on similarity measures, soi that the 

\ 

semantic embedding can be oerformed on relatively well*deflned 
Portions of the- scene. 

The aualltv of nonsemantic similarity measures was 
compared by performing a first partition based on brightness, and 
then following a global best-first merge Seauence using each 
similarity measure until there were only 250 regions remaining in the 
Scene, The outcomes of the different proposed meroe seauences were 
compared to see how well they honored the correct organization of the 
scene. 


The first similarity measure tested was based only on 
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the trlghtness Infori^atlon obtained through a neutral density filter. 
For each boundary between two regions-, the averaae absolute 
difference 'In brightness across the boundary w^,s coirputed. This 
average is computed as follows? 

N 

(n Average absolute briohtoess contrast = V for - bru.l 

ah b> 


where . 1 ' ranges over the set of ■'i adjacent picture element pairs along 
the boundary between reoions a and h. tifa, is the brightness through 
a neutral density filter of picture elements in region A, and br^ is 
the brightness of corresponding points in region b. The pair of 
regions with the smallest brightness contrast was merged first.. The 
result is shown in Floure ?0, Several serious, though verv narrow 
leaks, occurred? the sea and ground were connected hv a narrow necvr 
to the left Of the main tree trunk? tne ground# sea, and horizontal 
tree branch were joined together to the' right of the trunk, and the 
center vertical trunk became meroed with the treetor, a major leak 
occurred between the msin trunk and the ground, gut this wg* not 
surprising, given the lo" contrast in that area. 

Better results were obtained uslno a measure that was 
like the first, but that was based on the sum of the contrasts in 
each color separation. Thus, for each point on the common boundary 
between two regions, the absolute difference in- r, g, and b was 
computed, and the average of this "color contrast" was computed' for 
each pair of adjacent regions. The formula is? 

N 

E '•'ai ■ •'b.' + - 9bi' + - ^bi* 

, ^ ^ 


(2) Average color boundary contrast a 
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FIGURE 20 LANDSCAPE SCENE MERGED TO 250 REGIONS USING AVERAGE BRIGHTNESS 
CONTRAST ALONG THE BOUNDARY (EQUATION 1) 
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where . 3a, * ai^d b^, are the brightness of the Itb boundarv 
element fron realon a seen through the rel.,oreen. and blue filters 
respectively, and , b^, are correspondi na data fron reginn b, 
and i and N are defined as above. The result of i>erglng regions with 
smallest color contrast first is shown in Figure 13, b)e also tried a 
Similarity measure based on the sum of sguared contrast differences, 
and obtained results that , did not differ significantly from the 
results using absolute differences fsee Figure 211, 


Another similarity measure was to romoute the average 
along the boundary between two adjacent regions of the maximum color 
contrast' between anv two oicture elements drawn from sampling 
neighborhoods on opposite sides of the boun^arv. The attraction of 
this method is that it should behave very conservatively, declining 
to merge two regions if there Is evidence In 'the full resolution 
picture that they are dissimilar. Unfortunately, this measure is too 
conservative, and noisy reolons, liwe the ground and treetop, fail to 
coalesce before distinct smooth regions have grown together. The 
results are snown in Figure 72, 


The similarity measures 
based or. boundary contrast nrooertles, 
based on the similarity of aver.aae regt 
of our • success with the color 
previously, we tried a similarity meas 
difference in average r, o, and b for 
The formula for this similarity erltert 


described so tar are aai 
Anoth<»n class of measures Is 
on orooerties, nn the basts 
contrast criterion discussed 
ure that used the absolute 
each nair of adjacent regions, 
on Is 
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FIGURE 21 LANDSCAPE SCENE MERGED TO 250 REGIONS USING AVERAGE COLOR 
CONTRAST SQUARED ALONG THE BOUNDARY (EQUATION 2) 
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FIGURE 22 LANDSCAPE SCENE MERGED TO 250 REGIONS USING MAXIMUM COLOR 
CONTRAST ALONG THE BOUNDARY AT FULL RESOLUTION 
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(3) Average reaioi' color -llffprepce = - Tj^I + ig^ - g,jl +Jbg - b^l 

Where » 9 ^ » and b^ are the averaae brlohtnesses of. reoion a seen 
throuah the rod# oreen» and hUie filters# and » b^^ and 

are the corresoondlno briahtness of .reoton h. The result# shown In 
Flaure 23# is very close to and blue filters. The result# shown in 
Figure 23# Is Very close to the result ustno color houndarv contrast. 

The best results were obtained usiho a techniTue that 
combined our two best orevtous criteria. We defined the contrast- 
between two reolons to be the maximum of the color bnimdarv contrast 
(Equation 2) and the average region color difference (Equation 3)# 
and compared these maxima for each .pair of adlacent regions. The 

results are shown in Figure 2i, This was the slmilaritv criterion 
used for subsequent experiments with semantics. 
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FIGURE 23 LANDSCAPE SCENE MERGED TO 250 REGIONS USING AVERAGE COLOR 
CONTRAST OVER THE REGIONS (EQUATION 3) 
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FIGURE 24 LANDSCAPE SCENE MERGED TO 250 REGIONS USING THE MAXIMUM OF 
THE AVERAGE COLOR CONTRAST COMPUTED ALONG THE BOUNDARY 
AND OVER THE REGIONS (EQUATIONS 2 AND 3) 
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4.6 


Semantic Realon GroA't'i 


We have seen in the Vealon otTVlno experiments above 
that r’eoardless of the -priority function* sooner or later an 
erroneous meroe is nroposed. Semantics can be used either’ to refine 
the boundarv strength criteria so as to propose fe'^fer erroneous 
meroes [17] or else to bloeV- nrooosed meroes that are incorrect. 
Steppinq throuah meroes ■ or oposed by the nonsemantlc color difference 
criteria Illustrated In Flour's 24, It was observed that' serious fa-lse 
merges Seldom occur until the regions involved have grown 
sufficiently lame to permit semantic interpretations hased on region 
properties. This observation would suggest that merolno errors 
could be avoided on semantic grounds simply bv refusing to meroe 
region* with different interpretations, 

4,6,} An F.xoeriment in Semantic Region Growth 

The above contention was" tested 
interactively. The basic nonsemantic region growing algorithm was 
modified to checJc semantic comnat ibl litv before performing a ornposed 
merge. Merges are approved only if both regions carry the Same 
Interpretation, or If at least one ' of the regions ’ has not vet 
received an interpretation. Newly meroed regions inherit the 
interpretation of their parents (or Parent if only one carried an 
interpretation),. When two uninternr’eted regi'ons are merged, if the 
size of the resultant reoion e'xceeds a threshold (in oractice, the 
size of the smallest region typically involved In a significant 
erroneous merger], the program reouests t^e experimenter to sunolv 
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manually a correct interpretation. 


The modified prooraiti still runs orlmariiv in 
a nonsemantie mode, merqinn reolons across t*^e currently weakest 
boundary, subject to semantic override, TMs process continues until 
all semantically compatible merges have been performed, terminatino 
with a complete, semantically labeled partition of the scene. An 
indoor and an outdoor scene were each Partitioned with only minor 
errors Cdue mainly .to inadequate spatial samollnql. The results are 
shown in Figures 25 and 26, 

In both experiments, the size threshold for 
manual interpretation was set at seven samples. However, the great 
majority of regions inherited correct -labels through merging- before 
attaining that size. The final Partition depicted In Figure 25 was 
based on the -semantic first partition shown in Figure 16, Manual 
interpretations were orovlded initially for the 20 first partition 
regions tout of 235) that exceeded threshold size. Twenty additional 
interpretations were provided during the subsequent analysis when 
uninterpreted regions attained threshold size bv merging, Aii ma1o,r 
surfaces were extracted essentially Intact except for chalrlegs and 
tablelegs. These surfaces were too narrow fgr the available 
sampling density, and therefore corresponding semantic categories 
were omitted. The final partition in. Figure 26 was based on an 
initial brightness Partition (Figure 10). Although this .first 
partition contained substantl'al.ly more regions than the semantic 
partition used above, 'tKe; number of (Initial and total) manual 
interpretations required was surprisingly similar. In particular, 2i 
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FIGURE 25 FINAL SEMANTIC PARTITIONING OF SRI OFFICE SCENE 
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FIGURE 26 FINAL SEMANTIC PARTITIONING OF LANDSCAPE SCENE 
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first partition rpalon* wer<» laroer than six elements (Fioure 27i an<? 
recejve^i initial interpretations. The final oartitlon (Flaure 
contains minor leaks between the sea and nround (in the lower rlnht 
corner of seeneV and between two fraoments of sea across the left 
most ■ tree llmh. The sea-oround lea^: could have been avoided bv 

settino the labellno threshold at six samoles» the size of .the sea 
fragment when the leak occurred. The- sea-sea leak was a more 

fundamental error# resulting from a oradual erosion. -of small "tree 

* 

bark" regions Into the nelgnborlng sea regions before- a cohesive 
piece of "tree bark" could he Identified, 

The above results confirmed the original 
contentions that leakage occurred mainly between sizeable, reolo.ns and 
that leakage could be minimized# orovided tnat such- regions couid- . be 
semantically labeled, It was surprising how well a handful of 
manually Introduced Interpretations could Improve on a compietelv 
nonse.mantlr scene Partition, 

4,6,7 An Experiment In Semantic Peojon 
Classification 

The success of the above .experiments 
suggested a second set of experiments aimed . at studying the 

feasibility of .assigning interpretations aut,omaticaHv to regions 
that attain- thresheTd size. In the following experiments# regions 
were classified by comparing local attribute.? (e.g,# hue and 
saturation distributions) with those of Targe training regions 


0? POOR 
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desionflted by the exner itnepter . 

Classifications were perfot?''ed on each of the 
first Partition regions that had received a manual interpretation tn 
the experiment described in Section 4,6*1, In room scenes* most 
regions were uniquely dlstlnoulshed by a coniunctlon of local 
attributes which Included height «nd surface orientation (from 
simulated range datal In addition to hue and saturation. The large 
regions in Fiaure'l6 were uniquely interpreted with the exception 
that the bottom of the door and the wastebasket could not be 
distinguished, Tn outdoor scenes* local discrimination fbased only 
on hue, saturation, and brightness) proved much more difficult. Tn 
this case* automatic classification based on ■ hue* saturation* and 
brightness eliminates only the grossly unacceptable Interpretations*- 
leaving many ambiguities to be resolved. 

Table 4*»3 summarizes the results of 
classifying regions In Floure according to the training regions 
outlined In Figure 2e, Each' test region wgs compared .to all trainino 
regions using a Kolmogorov*Smlrnof f test .on the corresponding 
distributions of brightness, hue, and saturation (see Figure .7), 
These comparisons establish for each test region three rank orderings 
of similar training regions. Training regions that appear In first 
or second Place In at least two of these rank orderings are- chosen as 
possible Interpretat'ions for the test region (oresent-ed in' the third 
column of Table 4«3), ft second type of classification was obtained 
using a Bayesian classifier described In Apoendl-x A, Only 
interpretations with a ll)<:ellhood,>j 4 g^'eater than lo percent of the 
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FIGURE 27 REGIONS FROM FIRST PARTITION OF LANDSCAPE SCENE (FIGURE 10) 
LARGER THAN 6 SAMPLES 
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FIGURE 28 TRAINING REGIONS USED IN CLASSIFICATION EXPERIMENTS 


76 











llvceli^ood of the most probabJe Interpretation •'?pre retained. The 
results are shown in the fourth column of Table 4-3» with the 
Baveslan lUfelihood (between 0 and 11 In tsarei'tneses , The correct 
Interpretation is not always the best T'atch but Js almost always 
amono the top two or three alternatives. In several instances the 
interpretation amblaultles that occur are not serious because the 
alternative jnteroretatlons never aooear olctoriaily adiacent, ^ote 
that these results are for tralnlni reoions Jn the same scene as t’he, 
test data# Worse results can be expected for distinct scenes chosen 
from a common domain. 
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TaH)P 4.3 


CLA58IF1CATT0"^ OF PFGTO-iS I 'i FIGMPF 27 
ACCOHDINC xn TPAI^'ING RFC^TOmS If'* 25 


riton 

Correct 

Intproretati on 

»Miority F 8 
Classifier 

Classifier 

1 ' 

Treetop 

TreetoD 

Treefop ( .?36) 

2 

TreetoD 

TreetoP 

GroitP’Jf ,? 2 «) 
TreetoD ( .?) 

3 

Tr eetop 

Treetop 

Groun-lf ,20S) 
Treetoof .202) 

4 

TreetoD 

Treetop 

Rrounat ,223) 

Tronif(.l'5 4) 
Tree*^OD( , 133) 

5 

TreetoD 

'Trunk 

Treeto! 

GrounHc'.2 J ) 
TrunWT.pPs) 
TreetoPf . 1 

6 

TreetoP 

TreetoD 

treetopf .757 ) 

7 

Sky 

Skv 

SlcV(.qs57 

8 

Sky 

5<y 

Sieve 1 . 0 ) 

0 

^^ount Bln 

Sea 

♦'fountain 

Mountaip t.‘'303 

10 

^*oVnt a 1 n 

Mounta i n 

*'r>I)*italri ( ,7‘J8) 

1 1 

Mount a 1 n 

Fountain 

'Jiountalpf ,Dfi9) 

12 

. Hountair 

fountain 

,653) 

13 

1 al r 

Mountal n 

WoUnf.ai ,963) 

14 


Sea 

Seat .502) 

15 

Trunk 

Ground 

Grounaf ,?oi 3 
TreetoK ( , 1 0 1 1 
Truni-- f . 161 ) 

16 

Trunk 

Trunk 

TreetoD 

Tr0pV(,746) 
Gro'ipat .224) 

17 

Trunk 

Treetop 

Grcnjn'ir ,217) 
Trii"kf . 147) 

1 « 

spa /Pocv 

Sea 

. Fountain 

TrunV-t.337) 
Treetopt.?S2) 
Rnrift ,0921 ) 
Seat ,0?62) 

19 

Trunk 

Trunk 

Tru-ikf’,206)' 
GroilPdt ,2C5) 

20 

Ground 

Grourd 

GroundC ,205) 
TreeToP(.196) 

21 

GrounH 

TreetoD 

Ground 

Groupdt .?19) 
Treetopt . 196 ) 
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4,7 Tflvarrt Aotomatic Pealor Analysis 

The classification experiments sho" that in order tn 
automate semantic reolpn arowlno-» the basic naradlam must be 
auamented to handle regions with ambiguous interpretations. In this 
section, we shall outline briefly some plans for how this mlaht he 
accomplished. 

In admitting regions with multiple interpretations, 
nrovislon must be made (a) for determining the semantlr eompat ibl 1 i t v 
of a proposed meroe, (b) for determining the interpretation o.f ,a 
newly merged reoion, and (c) for ultimately resolvlno any 
ambloulties , Newiv meroed regions win acnulre the set of possible 
interpretations formed by intersecting tne inteoretation sets of its 
parent regions. If this set Is. emoty, the meraer is ohvieuslv 
Incompatible, ntherwtse, compatibility depends on whether the 
Interpretations left in the intersection set are ever Pictorial Iv 
adlacent In the domain of, interest. This point is best illustrated 
by example, in Figure Ic two regions, each class! fled .as "treetop or 
ground," can always he merged without rIsK of a leak because 
"treetop" and "oround" are never adiacent in the Imaoe, Thus, since 
these two regions adjoin, they must hav* the same i nterpretat 1 on t 
either both are ''trepton" or noth are "ground." Ho the other hand, 
two regions’, each labeled "treetoo or treebark," should not 
necessarily be merged because an error would occur if the actual 
identity of one region was "treetoo" while the other turned out to be 
"treebark," in the latter case, a decision on the merger must be 
deferred until the individual region interpretations can be further 
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constraired, 


Ambiouous realon interoretat ions can be refined in 

I 

basically two wavsi bv considerina additional* olobal reoton 

attributes (e.o,, texture and sbaoe)* ana bv considering contextual 
constraints Imposed by the possible Interoretat ions of nearbv 

(Drimarllv adlacent) realons. For Instance* reoiops of "treebarx” 
"ground*" and "treetop*" In FJoure Ic have similar color and texture. 
However* "tree trunk" reoJons* unlike most regions of "around" and 
"treetop*". are Predominantly thin and vertically elonoated te.o.. 
high ratio of vertical perlmeter/bori zontal nerimeter). Treebark 
regions are further distinguished by their extended vertical 
boundaries with their neiahbortno regions of .".mountain" and "water". 
Such distinctions will be exploited by -Jeveloping* with TSiS* a,d hoc 
domain dependent representations tor distinguishing among particular 
interpretation ambiguities. 
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The «bove ideas evolve'^ fro" t'*'d related ohlectlvess 
fa) we wanted a semantie reoien drower that could use the iflnd of ad 
hoc semantic constraints that are so eajsy to develop Interactively ir 
iSISj and Ch) we wanted a svste' that could be trained Incr emental-lv 
by introducing new semantics to block specific errors as they are 
observed’* The proposed sysfem uses two types of se.-nantic rules* one 
set defines distinguishing features of ' part icijl'ar interpretations* 
while the other defines which Interpretations’ can be legally 
adlacent. Erroneous’ meroes can occur if the regions involved are 
mislabeled* or if the system is unaware that the assinned 
interpretations mav be adlacent. In the former rase* a trainer rouio 
use ISIS to refine epPlrlcaliy the loca) and contextual features 
associated with- recoonlzino the misslno interpretation. Tn the 
latter ease* he woujd merely provide an explicit constraint 
prohihltlng the mprae’ of reolons with those assigned Interpretations , 

Our proposed merger semantics are dnsest in spirit 
to the structure rules advocated by Harlow and Elsenstat# whicn could 
be added incrementally to block soerific leaks. "Jew rules could, 
however* Interact with exlstlno rules In wavs that are difficult to 
predict a priori, Yakimovsky's Havesian semantirs, on the other 
hand, are Intrinsically interdependept ani must be arouired over manv 
trials using stochastic learning procedures. The probanliities 
are also not used directly to block merges put only to order the 
priority in which meroes are proposed. It Is cn-isecuentlv difficult 
to introduce semantic constraints in direct response to speciHc 
errors. There are other minor contrasts petwepn gur proposed 
system and Yakomovsky's , In our system, no distinction is drawn 


81 



between svntactic and semantic • stanes, .Semantics are 
continuously .introduced Into the analvsis a* individual re.oions 
achieve .a coherent size, we also do not subscribe to Ya)ciniovs)<v*s 
assumption that boundary and reqion semantics are independent. 
Rather-# it would seem that boundary relationships may. on occassjon. 
be the stronqest clue to a reo'lon's identity, which in turn could 
improve semantic hoimdarv strength. estimates used in setting meroinn 
priority, lYalcimovsicv's System, howevet, is stm the most successful 
automatic region analyzer implemented to date. 
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4,8 Toward Interactive «eqior Analysis 

For the foreseeable future* a larne number of- scene 

^ t# 

domains will be too complex to process comoleteiv automatical iv. Some 
of these scenes are also too detailed to segment raoidiy bv bang. For 
such scenes* an interactive methodology like that used earlier in 
studying semantic region analysis may prove useful In its own rlrrht 
as an‘ alternative to purely automatic or purely manual Partitioning j 
the human time and effort In entering roughly 20 semantic labels are 
probably less than the time and effort renutred to outline each of 
the 50 regions in the final partition of Fiaure ?h. However* when 
interactive region analysis is' Identified as an explicit goal* 
numerous" improvements come Immediately to mind. The experimenter 

can* with relatively little effort* crudelv outline and label the 
ma1or regions. This crude outline can be used directly as a good- 
initial Partition from wnich detailed boundaries can. be taPldlv 
grown. The outline can also provide training data for automatl ca I 1 v 
classifying tbe few addi'tional regions that mioht attain threshold 
sl 7 .e without Inheriting an Interpretation from the initial regions, 
(■jote that for this purpose* generality of the classifier over 
several scenes Is not eve.n an issue.) 

Two ether schemes for interactive region analysis 
were tried experimentally and are reported here* for completeness. In 
the first experiment* the scene was conventionally Partitioned into 
regions of homogeneous briohtnessj however* the oiobai priority queue 
was not built. Instead* the user was'asxed to oolnt at significant 
obiects in the' scene with a cursor. Facb time f'e pointed, the 
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boundaries between the first partition realon at that location and 
its nelohbors were entered onto tne oriorlty oueue. When the user 
had indicated what he felt were the ma1or scene entitles, the system 
began to merge .regions nonsemantlcal ly acro.ss the weakest boundary 
currently on the queue, After each merge, the gueue was updated to 
include additional boundaries between tbe new region and its 
neighbors, We expected that regions would, grow out from each 
starting kernel, and since meralnc was still done on a "best first"' 
basis, that regions would halt at object boundaries. This 
expectation was not realized in experiments on landscape scenes 
because of the similar coloring of many obdects and the textural 
Variations within oblects, Soatial oroximjty thus became a Primary 
factor in determlnlnn the kernel to which a given picture element 
would, ultimately become attached, .Still, this method might be 
useful for rapidly obtaining a crude partition, provided thats fa) 
the kernel .regions, are large and centrally located within each 
Object, and (b) the merging process js terminated early enough (sav, 
with 100 regions remaining). Users were aided jln their choice- of 
kernels by superimposing the actual fir.st partition regions or tne 
displayed scene, ■ 

In the second experiment, semantic labels were 
ass'loired to the selected kernels, and were subseouentlv used to block 
incompatible mergers. The resulting partitions ware consistently 
better than the nonsemantlr version. However, gualltv still 
depended critically on the number of kernels selected and on their 
Placement, For example, leakage can be minimized by choosing a 
succession of kernels along opposite sides of a desired final 
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boundary and then 'assionina different semantic labels to vrernels o^i 
each side. Selecting at least one vernei jn each PletoriaUv 

isolated patch of an ohiect fe.a,# fraaments of sea Isolated by tree 
limbs) allows meraina to continue further toward a comnlete oartitlon 
without eommlttino errors, 

Allowina a user to choose se">antle kernels all at 
once consumes considerably less of his time than requirlna him to 
stand by to supply Interoretations throuohout an analysis. On the 
other hand, it is difficult to know a priori which kernel reoions 
will Ultimately lead to good oartitlons, Conseauenti y# partitions 
resultino from semantic kernels were poorer than those obtained when 
interpretations were reauested during the analysis, ft contrlbutino 
cause is the fact that in the kernel scheme# the best meroe candidate 
is selected from a. restricted universe of reoions directly descended 
from the original kernels, rather than the set of all first partition 
regions. The >«ernel scheme Is thus much more likely to propose 
erroneous mergers early in the analysis, when this discrepancy Is 
greatest, A user's cost-effectiveness mioht be oottmized bv some 
combination of these aporoaches wherein reguests for Internretat? 
can be minimized by supplying initial kernels. 

The semantic kernel approach might prove more suited 
for partitionlno a scene into two classes (i,e,, figure and ground) 
than for obtaining a complete Pattitiontno, We Plan to investigate 
its use as an alternate means for inferrlnc intent when a user points 


at an ohiect 



5. 


automating GF^•EHATIOH OF OBJECT FINDING STPaTEGIFS* 


5,1 Introduction 

In this section we describe a system that can be 
rapidly prodramm.ed to find »ajor surfaces In relatively conoiex 
real-world environments, Oblects are desionated to the system bv 
circling examples with a cursor in a dlsplaved imaoe. The svstem 
formulates a strateoy tor flndlno the obiect# based on knowledge of 
available picture processing technloues and abo"t the makeup of the 
current pictorial domain. The resulting oroaram Is then run on 
representative scenes and d'ebuoaed interactively as errors 
materialize. This work is oart of a system that can be rapidlv 
oroorammed to find objects In office scenes as described in til. 

Previously we have used TRis to develop interactively 
a set of procedures for fjndino objects in office scenes. The basic 
Idea underlvinn these strateojes Is to rapidly dlsguailfy obviouslv 
irrelevant areas of the scene by samoilno for character istic 
attributes of the desired oblect falso referred to as the target 
ooject or taroetj*. Additional local properties are then tested to 
eliminate those samples belonoing to other obiects that share the 
acquisition attributes. The surface contalnino the acquired 
samples is then bounded, and the resulttoa reolpn is tested for 
appropriate size and shaoe, Proorams of this sort have been 
developed for flndlno various objects commonly found In office scenes 
(including door, table, chair, wall. Picture, and floor). 

*»lor.lc reported in this section was supported In part bv the Advanced 
Research Projects Agency under Contract DAHCn4-72-C-0008 , 
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A Plan for finding a door mioht consist of acaulrlno 
Samples at a height ilkely to contain few thtnos besides a rtoOTf and 
then validatlnc by ellminatina those samples which, on the ba.sis of 
remaining local attributes, have low likelihood of-beino part of a 
door. The door boundary is then obtained bv growing a region around 
the initial samples* using attributes sucb as hue and saturation, one 
alternative aoproach would be to first scan downward from the top of 
the image to a Point whose heloht Is unioue to door and wall. The 
edge of the door is then obtained by scannlno horizontally from this 
point, seeking an abrupt change in saturation, fDoors In SBI offices 
are deep brown while walls are tan,) Tf this edae has sufficient 
vertical extent, the door boundary Is then inferred from the known 
shaoe and size of our office doors', and tne local surface orientation’ 
measured in the image on the door's side of. the di scontlnultv. 
Finally, the predicted boundaries are confirmed bv testing for 
evidence of edges in the image. 

The above strategies can be senematlred Into 
acgulsition, validation, and bounding phases. Specific teehnigues 
for accomplishing each phase are chosen from available methods on the 
basis of current information about the pictorial domain, 

5,1,1 Acduisitlon Methods 

The most straightforward way of acauirino 
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samnlBS of the oblffct is to filter ran-^o-n nicture saiiPles* reiectino 
any sample that falls to pass a predicate descriptive of the desired 
oblect. For example* the sampled scene »av be filtered to retain or>lv 
those samples at a certain heloht. Alternatively, the scene can be 
sampled alono a locus to detect d1 sttnoiiisbtnq. hounderv 
discontinuities. These predicates are generated automatical ly from 
Pictorial examples, 

5.1.2 Validation -Methods 

Acnulred samoles are validated by cheekim 
addlt.ional local characteristics that disttnoulsh the desired oblecf 
from others with common acquisition attributes. These additional 
attributes are examined seouentlallv, reducing the number of samples 
that must be examined by successive tests. Total cost is minimized bv 
deferring computationally expensive procedur-es, '-‘’hen the cost of 
olanning the optimal order exceeds the known cost of uslno all 
remaining attributes, samrles are classified with resnect to possible 
oblects. Those samples ludned likely to belong to the desired oblect 
are retained. Samples' alono a boundary are further validated bv 
testing the boundary length, strength, and orientation using suitable 
masks , * 

5.1.3 Bounding Methods 

The final bounding Phase is accomplished in 

r 

various wav$. Given a suitable geometric model of the oblect and a 
prolective camera transform, a orocedure can compute analytically the 
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best alobal boundarv con'sistent with detected edce oolnts, T'^e 
cOTiputed boundary can be projected onto the i^aoe and validated usin«i 
extent masKs as above. Procedural models are available for common 
surface types such as horizontal -and vertical rectanoies. In the 
absence of such models, -a reaion mav be orown around Validated 
Dointsi neighbors of validated samples vhleh pass a chosen predicate 
(Often the acquisition oredicatel are added to a ousb-down stack. 
Neighbors of points on this stack are then examined recursively. All 
accepted points are collected together Into a region# for which a 
crude boundary can be obtained by" comouting a convex bull, 

5,2 Automatic Generation of Predicates 

The work- described here is an attempt to automate 
three key decisions involved In Planning a distinguishing features 
strategy t 

(1) What attributes should oe used for aeguirino 
initial samples? 

(2) What attributes should be used for validating 
those samples? 

(35 What attributes should be used in region 
crowing to obtain a complete outline? 
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All three -decisions involve rtistingulshino a desired surface in a 

\ 

context of other surfaces that mav he oresent, Acauisltion# for 
instance# involves distlnoulshlno a surface from other known surfaces 
in a domain# validation involves distinoulshlno a surface from other 
surfaces known to satisfy the acquisition attributes# and boundino 
involves distinauishinq a surface from all surfaces that could 
Dossiblv be adlaeent to it in the ima-?e, 

Tn order to explain predicate aeneration# a 
dioresslon to describe the data base is required. Objects are 
characterized bV local surface attributes, symbolic properties, and 
relations with other objects. The local surface attributes that are 
currently used as descriptors arej hrld^tness# color fsenarated into 
hUe and saturation eomoonents) # and when ranqe data are available# 
heioht and local surface orientation freferred to as ''orientation''). 
Local properties are stored as cumulative histocrams renresentino the 
probability densjtv functions for the object's attribute Values. 
These histoqrams allov the "prqbabilitv of an obiect havlnq an 
attribute in a qiven ranoe to be comouted by a sl’*>ple subtraction. 
These hlstoorams may be auqmented bv symbolic ornpertles such as 
boundarv descriptions# site and extent values# surface 
characteristics fe,a,# horitental), and spatial relations (e,q,, 
adjacent to# above# or left of). The cre.ation and modification of 
these descriptions are described below, 

'A detector Cor a "simple detector") is a procedure 
that checks whether a sample's attribute value lies within a o5ven 
contiquous ranqe of attribute values, a "composite detector" is a 
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conlunction ef simple rfetect'ors, A “predicate" is the eorrespondina 
LISP propram for a detector, A predicate returns T (true) if the 
value is within the ranae, and NIL (false) otherwise. The terms 
detector and predicate will often be used synonomous Iv. 
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5,2,1 Oeslan of Acquisition, Predicates 

Fiiterinc randotplv selected ■ sanries with a 
predicate that accepts only those of interest has proved to he an 
extrernely useful acouisltion tool in manual strateoies. Acquisition 
predicates should pass a few points on the desired obleet while 
excludino all others. They should be as cheaP and reliable as 
possible.. Naturally, these requirements are somewhat 

contradictory; usually a detector that allows most points on the 
target object to pass wii] also Pass seme Points from other oblects. 
To overcoipe this di-ffleulty, the program attempts to Generate 
compound detectors that exclude all samples from undeslred oblects, 
allowino at least a fe» points on the desired opiect to Pass, The 
program returns a list of any undesired objects that cannot be 
completely excluded, so that these may be distinguished subsequent to 
filtering, using more expensive olobal attributes. 

The expected cost and confidence of aPDlyinq 
a detector can be computed for each object. The confidence of an 
object oiven a detector Is loosely defined as the probability that' a 
sample accepted by the detector belonos to the object ‘ln question. 
The complete derivation is contained In Appendix A, Cost is 

defined as the expected cost (measure.i In milliseconds of CPU time) 
of application of the detector to the image. Given a detector and 
the expected number of points fro.m the set to be filtered that win 
both be Part of the object and be accepted bV the detector,- it is 
possible to compute the sampling frequency and the anticipated cost. 
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This tierlvation is also contained In Apoen^ix A, 

The oualltv of a detector is evaluated usina 
a heuristic function of cost and confUenee that rejects the design 
oriorities listed above, with this detector guajitv function, Q 
Cdescrlbed belowi, the oeneration of oood compound detectors,- becomes 
[for combinatorial reasons^ a problem in heuristic search. The 
search proceeds as follows, First a oood Initial set of simoie 
detectors Is selected. The candidate detector with the best n is 
then combined with others to see if the combination Improves the 0 
function. The best combination, in turn, is combine-1 with remainino 
candidates, and so on, until a terminating condition Is reached? 
either the set of candidates Is exhausted, some combination exceeds a 
oreset a threshold, or some effort bound is exceeded. 

The heuristic aualitv function, o, ,a/hich Is 
used to order a set of detectors, is shown oraonicaiiy lrs_Flour<» 79, 
Its maximum value is the confidence of the oh-lect and detector fwnich 
cannot be greater than J.Ol, and Its minimum valu» Is o. The cost of 
aoplylng the detector determines where the detector fails on the 
curve. For eduai confidence, and costs below a certain minimum, 
MTNCST (currently eouai to 2 seconds!, the G function is indifferent 
as to Which detector Is selected. For costs above a certain 
maximum, maxcst (now set to 100 seconds!, the function returns n. In 
essence, any cost below vin^ST is effectlvelv zero and anv cost above 
MAXC5T is effectively Infinite. In between, the curve is linear with 
a slope determined bv the confidence. ■'•*TwrsT and mAXCST, Hv 
maicing the maximum value of q be the confidence, added emphasis is 
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FIGURE 29 HEURISTIC QUALITY FUNCTION Q 



given to the goal of prod'-’Clno high confidence tests. Of 

competing tests# the one with lower confidence must also have lower 
costs in order to be considered better, 0 was defined empirically to 
orovide the functional characteristics detailed above. The Dafametcr 
values were also chosen emplricallv on tbe basts of best results. 

Two heuristics are used to select candidate 
simple detectors. The first uses the function, peaks, to locate Peak 
attribute values for the ohiect, PEAKS examines the character ization 
of the object (-which are probability densitv ciirves and wm pp 
discussed later) for a Peak providing a certain minimum area 
underneath. The function will then narrow the oeak if it can do so 
without slonlf icantlv decreasing the area, Minlmuir acceptable oeak 
size' and other variables are empiricallv determined Parameters, 
Peaks, which currently can only find a single oeak for each attribute 
curve, outputs a set o'f simple detectors corresponding to the 
attributes considered. 

The second selection nrobram, kinpangf, 
generates distinguishing detectors bv looking for continuous ranges 
of attribute Values which minimize the set of ambiguous objects. The 
program scans a range of an attribute for the target object, checking 
to see what other obieets share the -range, minramge also returns a 
set of single detectors. 

The present implementation of the heuristic 
predicate generator creates only composite detectors from the initial 
set supplied by PEAKS and MIMRA^'GE, The program cannot yet generate 
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detectors that are dJslunctions (such as he reauired for 
loeatlnc a red and white chectered tablecloth) or decision frees. 
Despite these limitations (which win probablv be removed in the 
future), the detectors generated are quite effective. 

To qenerate these composite detectots.i the 
Drograip, DflCO (Default ACQuisitlon) » selects the current best fin 
terhs' of Q)‘ detector on’ the list and tries to combine it with all the 
simple detectors on the Inltl’al list. Detectors that ’vould violate 
the description' oiven above (e.q.» which’ would combine two detectors 
with the sane attribute), or those In which the composite does not 
have a smaller ambiguity set than the parents are not considered. 
When a candidate is selected, a composite detector is formed and 
'retained only if Its value of o Is' hjaher than that of both ‘parents,- 
Before addino the new candidate to the appropriate spot In the list 
of detectors, a auicVr checlc is made to see If it could ever be ’oetter 
than the current best detector# This Is done by computlno 0 for a 
hypothetical detector with the same cost, but with a confidence of 1, 
If the 0 of this ‘combination is higher than tne best 0 so far, the 
new candidate Is )«ent. Otherwise, it is discarded. 

A candidate that is retained ’.is added 'to rhe 
proper location in the ordered list of candidates. After the first 
detector has been tried with all the possible single detectors, it Is 
marked as having been expanded, and DACf) iterates with the new list. 
If no candidates are retained, DACQ continues with the next best 
detector on the list. 


When there are no more candidates, or when a 
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detector Is Generated with a o above a' certain thres^oJd# the cronra'" 


terminates^ returnlna the complete ordered list of detectors 
generated , 

DACQ • spends considerable time attemotina 
minor Improvements to its best candidates,- 'Therefore, a CPU time 
limit was provided to allow the user to decide how much effort hACO 
should expend. If the time quantum provided hv the user is 
exceeded, the prooram -interrupts, prints the best candidate, and asics 
the user whether it should continue. If the rePlv is no, or there is 
no reply within ten seconds (eautvaient ■ to a no reoly), dACo 
terminates with the best so far. If the user wishes 'to -proceed, DACo. 
will- continue for another Quantum of time. The predicates produced 
by DACO Within- the time constraints (currently about 10 seconds-i are 
typically the same as those produced by lettlna it run to completion. 
Also,, the detectors produced are usually as oood as,- or better than, 
predicates defined, manually by experienced users, and often they are 
the same. 


The final steos In the automation of 
filtering are straightforward. The best compound detector Is 
converted to an equivalent LISP LAMBDA expression, which is then 
compiled and saved. The system already icnows what sampling density Is 
reguired for the detector, since It is a bY-oroduct of the cost 
computation (see Appendix A). The Image Is then random.iy sampled 
at the' appropriate -density,- and the resultino samples are filtered 
with' the LISP predicate to obtain a fe-^ acQUlsltion samples. These 
samples may need further- validation tests to discriminate residual 
amblauitles, Examples are found in section 5,3. 
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5,?,? • Oes'lon 'of Boun-^ln-s pr^'^lestes 

The Text sten in obleet fjndin'7 ,aft.er 
valldatlno the acoulsition samoles is to extract the boundary of the 
surroundlno surface, Bounnino a slnole soeciflc obleet is ,a 
slonl f icentlv different test than the olobal scene nartlt tenlno 
described in Section iv of this report. 

In ohiect boundlnrr,' a reolon is orown fro’n 
acoui s iti on. satpp les by includlna all contloubus sa^oies that satisfy 
an appropriate hounfilra eradicate, Soec 1 £ i ca I iv, the prooram, GPnw, 
apolles the oredlcate to sa-nnles adjacent to the acoulsltloh savoles, 
tf a satipie Is acceoted, its neinhrors (either 4.»adjacenr or 
8-ad1acent) are added to a list to- pe recursively exanined, "''hen 
all posslMe sairples have accreted to tbe orlolnal set, GBnW returns 
the resultlnq nev reolon. 


The boundino predicates differ fror 
aco'ilsltior credjeates in t-Af© Inoertant vavs. First, the particular 
reolon being bounded need only he distinguished froi* Pictorial Iv 
adjacent objects, as. opoosed to all ooiects in -the scene. Second/ 
reolon orowth reoulres predicates capable of eollectJno all nolnt-s on 
the object. 


The basic reol.on orower can be speeded up by , 
scannlno out horizontally and vertically from tne initial acquisttinr 
samples until a discontinuity Is- detected. Points located on the 
boundary are then joined into a ne-w, laroer reolon, and GRPH is used 
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to fill in anv holes* Scanners are oood for ouicliflv loeatlno edoes* 
but suffer by.requirino that all points between the starting point 
and the desired edoe oolnt oass the Predicate, A Point that fails 
terminates the scan. The reulon grower is less directed, hot as a 
result, can "go around" isolated points where the predicate falls. 
The combination of scanning and o'rowina has proven very effective. 

Since the predicate must accept all points on 
the object, the tas!^ of oeneratlho. an Initial predicate is Oreatlv 
simpllf ied»«we need only create detectors ■ based on the complete range 
for all attributes fa "full -range detector"). There is relatively 
little risk in using -these wHe range detectors, since they need only 
discriminate objects adjacent to the acquired object, l^gwever, it the 
system has specific knowieae about object adjacencies, the fu.il-ranoe 
predicate can he refined by conjoining It with negated fuii-ranoe 
predicates for those adjacent objects. 

A major shortcoming of the above approach Is 
that the region crowing predicates are apoiied uniformly to all 
points. Often, it is known that the color fsav) of a region is 
darker at the bottom than at the too, such information should be used 
to Produce more selective predicates that respond differently at 
different points jn the region. Predicates could also be designed 
which continuously change their expectation of what should come next, 
based on what has recently been seen, 

other shortcominos are due to poor predicate 
specification from the outset. This problem can arise wnen the 
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system receives an Ineonolete iescrictlon of the ohiect inittaiw, 
oerhaps not knowlno enoyoh about other objects in the environment, or 
Often not knewino of oood -iescriotors i^hiph, eitho'ioh obvious tn a 
person, way not be obvious to the syst-ei*, Peme^ies for these 
shortcemlnos will be discussed in an upcemjno dissertation £ 251 , 

In the cow<no months, . the semantic 
seomentatlon technioues described in Section iV vin be sttidied as an 
alternative means of boundiro objects, Cpnw • functions strictly bv 
orowino regions outward fro* initial kernal Points on the taroet 
Object, The other sectmentation system could be used to nrow 
simultaneously from kernels on the tamet and on adjacent objects. 
Additional knowiedde about relatlonsnips of the taroet and its 
neighbors could be used to obtain improved boundaries. 
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5.3 


Cxa^tr] es 


floures 3P fbrf'un^ 37 llJ-istr^te 

orneess uslra t^e lrtao*» s^ovp In fItutp In, T^e ohierts tn be 
located are t^e nictnre and tt>e n'alor surtaxes of t.ne rnatr. tbe seat 
and the bar*?. These orlects Calnn^ #ltn others that anrear In the 
Inaoe) were shewn to the svsten in other Inaoes of the sa^e scene. 
Descriptions were automat leal iv oenerated, and fro* these, the svste* 
Was able to create strateales for loeatino the oblerts. In the 
fiaures. the reolons created In the course of lecatlno the onieets 
are oenerallv shown as a hnundary with a tew points inside. The 
boundaries fin this case) are convex hulls cteated fro* the sannies 
by ISIS in order to emphasize the reolon for the user. In the l.isp 
predicates mentioned below, the function T I'*ITP Is used as the 
orimarv detector function. Its for*at ist fTlMTiP attribute samnie 
lew hioh), IiT^'TTP accepts samples whose attributes 11* within tha 
range from low to hiah. 

The first example sho*s the system loratim tha 
picture. Picture 3o olves the result of filtering a random sampling of 
the image with the medicate n.AMBDA (X) fldmiTp height x t,iq 
4,76)1, This predicate *ceeps onlv samples at a heioht within tha 
ranaa of 3, IP feet to ^,76 feet. Since onlv points fro" the wall, 
door, and Picture should he found at this nelaht, the validation 
phase need only distinguish amono thase three oblects, Tha results ox 
validation are shown In Ploure 31, The system then grows the region 
Shown In Figure 32 with the predicate fLA‘'Rhfl f<) fAtm (PlCP X) f^'OT 
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FIGURE 30 INITIAL ACQUISITION REGION 
FOR PICTURE 


FIGURE 31 PICTURE POINTS AFTER 
VALIDATION 



FIGURE 32 FINAL PICTURE REGION 
AFTER REGION GROWTH 
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( "fALl P X)))K prrp * is * t.tSp orpdlc^te fnnrtlon, q^n^rated rv thp 
svsta'" Jihich returns T If tn® sa^rJe has the prnrerties nf the 
Dlcture# #'AM P is » sl"'ilar predicate fnr the ’'’all. the cnmiete 
■arowth predicate renijlres tnat the sa-rpie lock 11i<e a olrture point# 
b’Jt not nice a wall point. 


Ip Klanres thmuin 37, the svste-x locates the 
Chair seat apd barV, In Flaure 33# the sanoied se®ne has been 
filtered for oolnts on the chair naeic# uslnn tn® eradicate (LA^pda 
C<) (AMD (LI‘'ITP HFir.HT X 1 .p? 7.1,4) (M’HTP ’-IltF X 0.0 A.im. This 
oredlcate# chee^lnn helant and hUe, selects only the slnale sanoie 
shown In the rloht middle ot the chalrbacv in Flaure 33. The sample 
is retained durlno validation, and is used as a startino nolnr for 
the reolen arower whipb oenerates and 'ises a predicate to 
discriminate the sarples of tne chalrharx from those of adlacent 
table and *all areas resuitlm In toe reolon shown in Floure 34. 

The acpulsltlon, validation, and houndlno process for 
the chair seat Is shewn in ^ inures 3*> tnrouoh 77, The renton sho'wn in 
^Inure 3b is th, r^smt gf fliterina with the predicate fLAMPOA fXi 

* These predlcofas are ccelunctlons o' f-Jll ranoe nredlrates far tne 
Obdects in question. Tnat is. thev chec^ thaf the value of each 
attribute of a sample falls within t^e nossihle ranoe of values for 
the ohiect In ouestion. 
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FIGURE 33 SINGLE POINT ACQUIRED 

AND VALIDATED ON CHAIR BACK 
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(AND (LIMITP nfTGHT X 1,09 l.f.2) (LIMITP npTFNT X n.P (>)))$ which 
selects samples with heloht and orientation aporooriate for the chair 
seat. One point is eliminated durino validation# since Its color does 
not match that of the seat# olvino the reainn shown in Floure ia, j 

Finally# Figure 37 shows the results of orowlno with a oredlcate that 
distinguishes the seat from the floor and fh* wall. 


l 
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FIGURE 37 FINAL CHAIR SEAT REGION 
AFTER GROWTH 


OF POOR 


106 


6 


DISCUSSIOM- 


Thls reoort has discussed experl-nents in autoiratlc and 
cooperative - (man-Tachi-nel scene analvsls, u'sino Interactive 
taciiit-les of ISIS ‘'and associated suhsvstems» Slonificant results- in 
the past year' inciudei (1) developfent of techniques . for cooperative 
scene analysis ana their successful apniieatlon to landscape scenes', 
(?) ■dev>eiop"'enr - of- a new interactively tralnahle 'semantic r.eqion 
qrowlno' oar'a'dlom (based' on *1) and (3) development, of a program 'that 
can.' automa.ticanv ‘ oenerate s'tratenies for finding or dlstinouishino 
obiects , 'Olveh pictorial examoies. 

The biggest- guestlons reoardlnq the utility -, 0,1 , -.our • 
Interactive methodology concern ' It'S generality, ' hur results, so 
far » are' in-depth analyses' of a few selected pictures from three 
domains. In the coming veer, we will begin seriously to investigate 
generality on t'wo fronts? (M the apo i icaoi 1 1 tv of^ strategies and 
semantics developed .- in one ' picture to other -pictures in the same 
domain, and (31 the ' a'po'} Icab i 1 1 tv of the - basic methodoloov to 
additional domains, ineludlno several biomedical aoD-l icatiehs already 
underway , 


The number of variables charact.er iVtng both scenes and scene 
analvsls strategies ; present. a bta ‘ obstacle to systematic 
experimentation, and could explain why s.u.ch experimentation has been 
generallY avoided. To rite one examole, ' the effectiveness of 
manually sunplled semantic kernels on segmentation depends, among 
other things, on whjeh regions are- selected as kernels, on the order 
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In wMch t*iev are ehosen» and on the tvoe of first partition belnn 
used. Exoer I’flentation Is also relatively tl">e eonsu’i'lna on a heavllv 
loaded time Shared system. For these reasons the experimental 
Process must be automated to run on an annotated library of correctly 
analyzed pictures, vSuch .a data base win allow exoer.lmenters to 
test the aeneralltv of proposed semantic descriptions In multiple 
Scenes, It. win be used to monitor the performance of automatic 
scene- analysis strateoies and alert the experimenter when an error Is 
about to occur. The data base' can even supply reP.uested 
interpretations when testlnp interactive proorams. The data base win 
be developed usinq Interactive techniques to obtain perfect 
segmentations « 

The. conttnuino qlobal obleetlve of this prelect Is to 
facilitate the process whereby a computer acquires the Icnowiedoe 
needed to- analyze a set of scene* from a common domain. Our coals- 
for the cominq year are set forth- In the followino seenarioi a -user 
selects a representative scene from a ne» domain, erudeiv circles the 
principle reolons on a disnlav# and provides thetr Interpretations., 
Thp system then completes the seomentatlon, perhaps requestlnd a few 
additional interpretations , The resuitlno oartltlon is then used by 
the system to train classification alnorlthms and to deduce 
contextual constraints, nsinq this knowiedqe, the system, next 
attempts automat teal ly. to partition additional scenes from the same 
domain.. The user modifies the system's Vfnowiedoe base when errors, 
are observed. 

The above scenario Js based on clearly defined extensions of 
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present capahil it ies , WorVr has already becjn on reflnina an 
Initial crude outline into a complete partition (see Section 
The semantic reoion arovino paradlom jfiil be aiitoxated in oradnal 
steps alono lines suqaested in Section 4.7, TnitlallVr reoions 
reachlna critical sir.e win be automatical! v classified with resoect 
to tralnlnq reoions. However, human assistance wjjt he soUoht to 
resolve ambiouities , The second step win attempt to resolve 
ambifluities automatically, usino additional, Interactively formulated 
region attributes fe.o., snaoe and texture! and contextual 
constraints, An implemented constraint satisfaction system win 
continuously monitor the consistency of newiv proposed region 
Interpretations with those nreviouslv assigned, in order to resowe 
both current and pre-existino amblouitles MS], The third and most 
tentative step will be to use the methods of Section 5 to deduce 
region descriptions and conte>^tUai constraints automatically from 
examples in a correctly segmented scene. 
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APPENDIX A1 


CTST CONFIDFNCfi:. 


This antx^nrtix defines the concents of cost and reliability 
Cor confldencel vised In eValwatno detectors# and. derives their 
computational tornulas. 


Copif j dence 

Tn the conrse of plannino a strateoy# or exa^minlno a picture 
interactively, an ifrcortant question frea'ientiy arisesj "Given a 
Particular set of attribvite ireasurenents op an item (a reqion or 
sample), vhat set of objects could it possibly belono to?" If the 
measurements have already been ta^cen from some reolon cor sample), 
the answer to .the question allows the syste" to "classify" .the item 
as helonqiro to some element of a set of objects. If the sYst»m is 

' J ^ 

considerino aererat.im seme detector for taKlno the measurements, 
then the answe,r to the question allows an estimate of how weH t.he 
detector should worW. 

In addition to ><nowina the s*t nf Possible objects which 
could provide the m.pasurements , it is also important to tcnow the 
relative livcellhood of each object in the set. For examnle, if the 
system kcnows that t^e local csur.facei orientation of a sample in 
horizontal, then it should Icnow not only that it heionqs to either 
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tablfttop or floor* ^'Ut also that it is nore llicelv <"0 belong to t^e 
floor since the floor (usually) takes un mijrh irore of the innaoe than 
floes the tabletoo, Tf the syste*^ also T>easures the height of the 
sample to be 2 1/2 feet* then it should realize that the sample has 
to belopo to tabletop, since that Is the onlv oblert which has both 
properties. Finally* if the measurements soan a wide range of values, 
then "several objects' attribute ranges mav overlan the measured 
range, Jn this case, the system should take into account the amount 
of overlap, 

we can formulate a set of requirements for a system which 
answers our original nuestion. The system ShoUli account for a nriorl 
probahilitles of the item belonging to an oblect, should be able to 
handle eomblnatiops of attribute measurememts , and should he able, to 
measure and use the degree of attribute range oVerlaP, 

The prooram that satisfies these regulrements , CO’JF, measures 
the confidence that- -a sample belongs to an object, based -on a set 'of 
detector outcomes, P recursive Baves procedure is used in the 
comoutat j ors [2S1 , Attribute measurements are taken as independent 
events which are linked together through object descriptions. That 
is. If no Object desrrtntlons were available* the measurements would 
b<» truly nrobaballst icall V Indenendent events, dowever, having object 
descriptions allows the system to compute dependent Drohablliti'*s , 

Before beginning the discussion of the procedure* some 
notation is reaulred. The set of outcomes ot the application of a set 
of detectors, {D„. will he written# D" . A detector, * 
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since the outco'T'e of a cletector is affected only by tne oh-je'cts it is 
con^iitioned on> and not bv the outco’^es of other detectors, 
P(D lo , D"‘^) is reduced to P(d loj end the exoresslon becomes? 

' n J n I 


P(DjOj) X P(0JD"-M 

P(0.lD") s 

PCD^ID^'M 


This Is the usual wav of writlna the recursive Baves expression, 
P(D„lO|) Is the probablJltv that a ranlOT saitole fro’’' object* o, » 


will have outcome 

D' When 

n 

the detector Is applied,' 

This 

probahli 1 tv 

is computed from 

stored 

data 

bV summlna 

the 

samples 

that would 

produce the oiven 

outcome , 

and 

dividina by 

the 

total 

number of 

samples measured 

, The 

data 

used for 

this 

step 

come from 

chafacterlzat ions' 

oeretated 

hv 

the System 

from 

examples of the 

ohiect , 







The term. 

P(0 1D"‘ 

*) 

Is Computed 

recursively. 

termlriat ipo 


with p(Oj!d^ 5 which is exoarded 1n the normal wav» 


P(0,lD^) 


PtD^lO,) X P(0.) 
P(D,) 


(3) 


p(0,) is the a rriori llk-elihond nf a randomly selected samnie 
helonolncf to o, , This is usually computed hv comparlno the expected 
oroiected tw’o-dinensi'onal area of the ohiect with the area of the 
imqrje, PlOj) also reflects the set of oblects expected to be present, 
Bv settino 'P(Oj) to zem, tne ohiect Is effectively eliminated from 
cons 1 deration , 
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To conPUt** PlD.,) » P(D,IOj) is su'»''Pod over all oblects.- 

P(D,) = 2 P{D, lo,) X P{0.) f45 

o 

1 

Wow, with the derivation of P( 0 ,Id,) , and therefore the derivation of 
p( 0 ,lD'''’) , the final details for the ortolnal • coi^putatiop of 
p{0,lD") can he cowpleted, 

T.he last term to he expanded is p{d^1d"'’) , This is where; 
the detector outcomes are linked throuan. the obieet descriptions. As 
in Eauation 4» the term win be expanded over- all oblects, 

R(d^Id"'M = 2 P(D^ID'’■^ Oj) X P(0 Id"-’) 

o 

J 

As mentioned previously* the outcome of a detector depends on 
oblects, not on other detectors exceot insofar as they select 
oMects, Therefore# the term p(d^Id"'’, 0^) is reduced to P{D^io.) . 
and 'the expression is rewritten. as 

P(DjD"-^.= 2 P(D„I0|) X P(OjlD"-^) csi 

0 

I 

Tnis expression is Eauation 4, conditioned on the remalnina set of 
detectors, d""’ * 

Several points need to be emphasized, Tt is -important to- be 
able to limit the- set .of oblects under consideration, Llml-tlna th-is 
set allows the- svstem to reduce its window into t.he scene# from a 
fvill imaqe, to a selected subimaoe, Thl.s reduction typically comes 
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about when .-the system locates some ohiect an-1 then 'looks In the 
imme«!lat-e vicinity .for other oblects. The initial object then 
orovi-aes a new window -into the scenes usually with sn appreciably 
smaller subset of objects that need to be considered, 

Ano.ther imporfant- point Is that the derivatlors use the 
outcomes of the set of detectors, and do not denend on the value of 
the outcomes. That Is, the eomoutatiohs do not reouire that all the 
detectors accept the sample, but only that there Is some outcome. 
This fact allows for the possibilitv of oeneratino decision trees 
Where one branch of the tree is taken for an acceptance, and the 
other for- a rejection. Althouah the system does not currently 
dene-rate decision trees tbut only conjunctions that require that all 
outcomes, are accenances), there are no theoretical barriers, 

COST- 

In this discussion, cost will ne the anticipated cost 
(measured in milliseconds of CPU tlmei of aoplylnc a detector to a 
samnie (regions are pot applicable herei. In the discussion of 
confidence, the order of anoUcation of detectors was immaterial,, 
since all outcomes were needed. In this discussion, .detectors will be 
limited- to composite detectors, i,e,, conjunctions of simple 
detectors aop-lied seouential ly. The cost -of application of a 
composite detector to an image is hiohiv order dependent, since a 
rejection by one detector In a sequence i.moiies that the remainino 
detectors in the seouence need not be aoclled. 
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Most e£ the rotation required here' was defined in the 
oreredino section, l<et 0^(0") he tne Per se'"Ple cost of aPnlyino 
detector seouence, d" » and let c^(D,) he the per samrle cost of the 
slnale detector* d. , The slnole detector costs are measured 

nreviouslv by the syste-r.. 

The "cost of application of the detector sequence* d" » Is 

CjlD") - Cj(D,) + CJDj) X P(D,) + C^lDg) x P(D,, Dj) + • 

" . 1 

= S Cj(D.) X P(D''M , (fi) 

i = 1 

D, is always applied, A fraction of the samples tested with d, 
(l.e.f those accepted) will also be tested with Dj , The percentaoe' 
of samples passed by and ^'Hl also be tested bv Og * and so 
on . 


The cost of application of a detector to the Imaoe Cor to a 
window) is the per sample cost of the detector tines toe number of 
samples to he tested. Since tne number of samples to be 'tested • also 
depends'' bn the detector* that number is derived he.re. It Is assumed 
that a certalh ajyen number of samples on th* tarnet ohiect should be 
accepted hV the detector. This numoer* oreset py the user, is , 
From Np and sore oh'lect data* it Is possible to compute a sampiinn 
density* s, which win provide a sufficient number of samples on 
the oblect* such that n should be accepted bv the detector, ‘ is 
the total number of samples that should fall on the ob’dect* olven a 
sampilnq density, 5 , and the expected oblect size* 'S(O), 

Ng. = 5 X S(0) ( 7 ) 
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But slncft enlv the fractlnn* p(d"! 0) » of env rflOdoB* sampling cf H ran 
be exoected to he acceoted bv D"t 

Nj, = X P(D"[0) . (8, 

Therefore# 

Np = 6 X S(0) X P(D"IO) , f<9) 


and 


S(0) X P(D"!0) 


Since the syste"> does not retain the correlated data necessary to 
deter>rine P(D"!0) , it instead uses the mintT'im nf these set of 

probabilities, |p(D,IO) P(D^IO)} , 

.With 5 » the total nunher of 5a'"Dles, t.o be checked In 

a window, W, Is the window sire tines 5, or 

N^^ = 5 X S(W) . no) 

This gives a total onst of 


C(D") = X C,(D") . 


N 


D 


P(D"I01 


S(W) 

X X C (D 

S(0) 


") 


m ) 
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